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ANTHRAX TOXIN FUSION PROTEINS AND USES THEREOF 



This application is in a continuation in part 
application of Serial No. 08/021,601 filed February 12, 1993. 

BACKGROUND OF THE INVENTION 

10 The targeting of cytotoxic or other moieties to 

specific cell types has been proposed as a method of treating 
diseases such as cancer. Various toxins including Diphtheria 
toxin and Pseudomonas exotoxin A have been suggested as 
potential candidate toxins for this type of treatment. A 

15 difficulty of such methods has been the inability to 

selectively target specific cell types for the delivery of 
toxins or other active moieties. 

One method of targeting specific cells has been to 
make fusion proteins of a toxin and a single chain antibody. A 

20 single- chain antibody (sFv) consists of an antibody light 
chain variable domain (V L ) and heavy chain variable domain 
(V H ) , connected by a short peptide linker which allows the 
structure to assume a conformation capable of binding to 
antigen. In a diagnostic or therapeutic setting, the use of 

25 an sFv may offer attractive advantages over the use of a 
monoclonal antibody (MoAb) . Such advantages include more 
rapid tumor penetration with concomitantly low retention in 
non- targeted organs (Yokota et al. Cancer Res 52:3402,1992), 
extremely rapid plasma and whole body clearance (resulting in 

30 high tumor to normal tissue partitioning) in the course of 
imaging studies (Colcher et al. Natl . Cancer Inst. 82: 1191, 
1990; Milenic et al . Cancer Res. 51:6363, 1991), and 
relatively low cost of production and ease of manipulation at 
the genetic level (Huston et al . Methods Enzymol. 203:46, 

35 1991; Johnson, S. and Bird, R. E. Methods Enzymol. 203:88, 

1991) . In addition, sFv- toxin fusion proteins have been shown 
to exhibit enhanced anti- tumor activity in comparison with 
conventional chemically cross-linked conjugates (Chaudhary et 
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al. Nature 339:394, 1989; Batra et al . Cell. Biol. 11:2200- 
2295, 1991) . Among the first sFv to be generated were 
molecules capable of binding haptens (Bird et al. Science 
242:423, 1988; Huston et al. Proc. Natl. Acad. Sci . USA 
5 85:5879, 1988), cell-surface receptors (Chaudhary et al., 

1989), and tumor antigens (Chaudhary et al. Proc. Natl. Acad. 
Sci. USA 87:1066, 1990; Colcher et al . , 1990). 

The gene encoding an sFv can be assembled in one of 
two ways: (i) by de novo construction from chemically 

10 synthesized overlapping oligonucleotides, or (ii) by 

polymerase chain reaction (PCR) -based cloning of V L and V H 
genes from hybridoma cDNA. The main disadvantages of the 
first approach are the considerable expense involved in 
oligonucleotide synthesis, and the fact that the sequence of 

15 V L and V H must be known before gene assembly is possible. 

Consequently, the majority of the sFv reported to date were 
generated by cloning from hybridoma cDNA; nevertheless, this 
approach also has inherent disadvantages, because it requires 
availability of the parent hybridoma or myeloma cell line, and 

20 problems are often encountered when attempting to retrieve the 
correct V region genes from heterologous cDNA. For example, 
hybridomas in which the immortalizing fusion partner is 
derived from MOPC-21 may express a V L kappa transcript which 
is aberrantly rearranged at the VJ recombination site, and 

25 which therefore encodes a non- functional light chain (Cabilly 
& Riggs, 1985; Carroll et al . , 1988). Cellular levels of this 
transcript may exceed that generated from the productive V L 
gene, so that a large proportion of the product on PCR 
amplification of hybridoma cDNA will not encode a functional 

30 light chain. A second disadvantage of the PCR -based method, 
frequently encountered by the inventors, is the variable 
success of recovering V H genes using the conditions so far 
reported in the literature, presumably because the number of 
mismatches between primers and the target sequence 

35 destabilizes the hybrid to an extent which inhibits PCR 
amplification. 

Thus, methods of targeting toxins to specific cells 
using single- chain antibodies methods have been difficult to 
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practice because of the difficulties in obtaining single chain 
antibodies and other targeting moieties. Also, none of the 
proposed treatment methods has been fully successful, because 
of the need to fuse the toxin to the targeting moiety, thus 
5 disrupting either the toxin function or the targeting 

function. Thus, a need exists for a means to target molecules 
having a desired activity to a specific cell population. 

Bacterial and plant protein toxins have evolved 
novel and efficient strategies for penetrating to the cytosol 

10 of mammalian cells, and this ability has been exploited to 
develop anti- tumor and anti-HIV cytotoxic agents. Examples 
include ricin and Pseudomonas exotoxin A (PE) chimeric toxins 
and immuno toxins . 

Pseudomonas exotoxin A (PE) is a toxin for which a 

15 detailed analysis of functional domains exists. The sequence 
is deposited with GenBank. Structural determination by X-ray 
diffraction, expression of deleted proteins, and extensive 
mutagenesis studies have defined three functional domains in 
PE: a receptor -binding domain (residues 1-252 and 365-399), 

20 designated la and lb, a central translocation domain (amino 
acids 253-364, domain II), and a carboxyl- terminal enzymatic 
domain (amino acids 400-613, domain III). Domain III 
catalyzes the ADP- ribosylation of elongation factor 2 (EF-2) , 
which results in inhibition of protein synthesis and cell 

25 death. Recently it was also found that an extreme carboxyl 

terminal sequence is essential for toxicity (Chaudhary et al . 
Proc. Natl. Acad. Sci . U.S.A. 87:308-312, 1990; Seetharam et 
al. J. Biol. Chem. 266:17376-17381, 1991). Since this 
sequence is similar to the sequence that specifies retention 

30 of proteins in the endoplasmic reticulum (ER) (Munro, S. and 
Pelham, H.R.B. Cell 48:899-907, 1987), it was suggested that 
PE must pass through the ER to gain access to the cytosol. 
Detailed knowledge of the structure of PE has facilitated use 
of domains II, lb, and III (together designated PE40) in 

3 5 hybrid toxins and immunotoxins . 

Bacillus anthracis produces three proteins which 
when combined appropriately form two potent toxins, 
collectively designated anthrax toxin. Protective antigen 
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(PA, 82,684 Da (Dalton) (SEQ ID NOS: 3 and 4)) and edema 
factor (EF, 89,840 Da) combine to form edema toxin (ET) , while 
PA and lethal factor (LF, 90,237 Da (SEQ ID NOS: 1 and 2)) 
combine to form lethal toxin (LT) (Leppla, S.H. Alouf, J.E. 
5 and Freer, J. H., eds . Academic Press, London 277-302, 1991). 
ET and LT each conform to the AB toxin model, with PA 
providing the target cell binding (B) function and EF or LF 
acting as the effector or catalytic (A) moieties. A unique 
feature of these toxins is that LF and EF have no toxicity in 

10 the absence of PA, apparently because they cannot gain access 
to the cytosol of eukaryotic cells . 

The genes for each of the three anthrax toxin 
components have been cloned and sequenced (Leppla, 1991) . 
This showed that LF and EF have extensive homology in amino 

15 acid residues 1-300. Since LF and EF compete for binding to 
PA63, it is highly likely that these amino- terminal regions 
are responsible for binding to PA63. Direct evidence for this 
was provided in a recent mutagenesis study (Quinn et al. J. 
Biol. Chem. 266:20124-20130, 1991); all mutations made within 

20 amino acid residues 1-210 of LF led to decreased binding to 
PA63 . The same study also suggested that the putative 
catalytic domain of LF included residues 491-776 (Quinn et 
al., 1991). In contrast, the location of functional domains 
within the PA63 polypeptide is not obvious from inspection of 

25 the deduced amino acid sequence. However, studies with 

monoclonal antibodies and protease fragments (Leppla, 1991) 
and subsequent mutagenesis studies (Singh et al . J . Biol. 
Chem. 266:15493-1549 7, 1991) showed that residues at and near 
the carboxyl terminus of PA are involved in binding to 

30 receptor. 

PA is capable of binding to the surface of many 
types of cells. After PA binds to a specific receptor 
(Leppla, 1991) on the surface of susceptible cells, it is 
cleaved at a single site by a cell surface protease, probably 
35 furin, to produce an amino- terminal 19-kDa fragment that is 
released from the receptor/PA complex (Singh et al . J. Biol. 
Chem. 264:19103-19107, 1989). Removal of this fragment from 
PA exposes a high-affinity binding site for LF and EF on the 
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receptor-bound 63-kDa carboxyl - terminal fragment (PA63) . The 
complex of PA63 and LF or EF enters cells and probably passes 
through acidified endosomes to reach the cytosol . 

Cleavage of PA occurs after residues 164-167, 
5 Arg-Lys-Lys-Arg. This site is also susceptible to cleavage by 
trypsin and can be referred to as the trypsin cleavage site. 
Only after cleavage is PA able to bind either EF or LF to form 
either ET or LT. 

Prior work had shown that the carboxyl terminal PA 

10 fragment (PA63) can form ion conductive channels in artificial 
lipid membranes (Blaustein et al . Proc. Natl. Acad. Sci. 
U.S.A. 86:2209-2213, 1989; Koehler, T. M. and Collier, R.J. 
Mol. Microbiol. 5:1501-1506, 1991), and that LF bound to PA63 
on cell surface receptors can be artificially translocated 

15 across the plasma membrane to the cytosol by acidification of 
the culture medium (Friedlander, A. M. J. Biol. Chem. 
261:7123-7126 , 1986). Furthermore, drugs that block endosome 
acidification protect cells from LF (Gordon et al . J. Biol . 
Chem. 264:14792-14796, 1989; Friedlander, 1986; Gordon et al. 

20 Infect. Immun. 56:1066-1069, 1988). The mechanisms by which 
EF is internalized have been studied in cultured cells by 
measuring the increases in cAMP concentrations induced by PA 
and EF (Leppla, S. H. Proc. Natl. Acad. Sci. U.S.A. 79:3162- 
3166, 1982; Gordon et al . , 1989). However, because assays of 

25 cAMP are relatively expensive and not highly precise, this is 
not a convenient method of analysis. Internalization of LF 
has been analyzed only in mouse and rat macrophages, because 
these are the only cell types lysed by the lethal toxin. 

3 0 SUMMARY OF THE INVENTION 

The present invention provides a nucleic acid 
encoding a fusion protein comprising a nucleotide sequence 
encoding the PA binding domain of the native LF protein and a 
nucleotide sequence encoding an activity inducing domain of a 

35 second protein. Also provided is a nucleic acid encoding a 
fusion protein comprising a nucleotide sequence encoding the 
translocation domain and LF binding domain of the native PA 
protein and a nucleotide sequence encoding a ligand domain 
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which specifically binds a cellular target. Proteins encoded 
by the nucleic acid of the invention, vectors comprising the 
nucleic acids and hosts capable of expressing the protein 
encoded by the nucleic acids are also provided. 
5 A composition comprising the PA binding domain of 

the native LF protein chemically attached to an activity 
inducing moiety is further provided. 

A method for delivering an activity to a cell is 
provided. The steps of the method include administering to 

10 the cell (a) a protein comprising the translocation domain and 
the LF binding domain of the native PA protein and a ligand 
domain and (b) a product comprising the PA binding domain of 
the native LF protein and a non-LF activity inducing moiety, 
whereby the product administered in step (b) is internalized 

15 into the cell and performs the activity within the cell . 

Characteristics unique to anthrax toxin are 
exploited to make novel cell -specific cytotoxins. A site in 
the PA protein of the toxin which must be proteolytically 
cleaved for the activity- inducing moiety of the toxin to enter 

20 the cell is replaced by the consensus sequence recognized by a 
specific protease. Thus, the toxin will only act on cells 
infected with intracellular pathogens which make that specific 
protease. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a graph of the percent to which mutant 
proteins are cleaved by purified HIV-1 protease. The mutant 
proteins include protective antigen (PA) mutated to include 
the HIV-l protease cleavage site in place of the natural 

30 trypsin cleavage site. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 
Nucleic Acids 

Lethal Factor (LF) 
35 The present invention provides an isolated nucleic 

acid encoding a fusion protein comprising a nucleotide 
sequence encoding the PA binding domain of the native LF 
protein and a nucleotide sequence encoding an activity 
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inducing domain of a second protein. The LF gene and native 
LF protein are shown in SEQ ID NO: 1 and 2, respectively. The 
PA gene and native PA protein are shown in SEQ ID NO: 3 and 4, 
respectively. 

5 The second protein can be a toxin, for example 

Pseudomonas exotoxin A (PE) , the A chain of Diphtheria toxin 
or Shiga toxin. The activity inducing domains of numerous 
other known toxins can be included in the fusion protein 
encoded by the presently claimed nucleic acid. The activity 

10 inducing domain need not be a toxin, but can have other 
activities, including but not limited to stimulating or 
reducing growth, selectively inhibiting DNA replication, 
providing a desired gene, providing enzymatic activity or 
providing a source of radiation. In any case, the fusion 

15 proteins encoded by the nucleic acids of the present invention 
must be capable of being internalized and capable of 
expressing the specified activity in a cell. A given LF 
fusion protein of the present invention can be tested for its 
ability to be internalized and to express the desired activity 

20 using methods as described herein, particularly in Examples 1 
* and 2 . 

An example of a nucleic acid of the invention 
comprises the nucleotide sequence defined in the Sequence 
Listing as SEQ ID NO: 5. This nucleic acid encodes a fusion 

25 of LF residues 1-254 with the two-residue linker " TR " and PE 
residues 401-602 (SEQ ID NO: 6). The protein includes a Met- 
Val-Pro- sequence at the beginning of the LF sequence. Means 
for obtaining this fusion protein are further described below 
and in Example 1. 

3 0 A further example of a nucleic acid of this 

invention comprises the nucleotide sequence defined in the 
Sequence Listing as SEQ ID NO: 7. This nucleic acid encodes a 
fusion of LF residues 1-254 with the two- residue linker "TR" 
and PE residues 398-613. (SEQ ID NO: 8) The junction point 

35 containing the "TR" is the sequence LTRA and the Met-Val-Pro- 
is also present. This fusion protein and methods for 
obtaining it are further described below and in Example 2 . 
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Another example of the nucleic acid of the present 
invention comprises the nucleotide sequence defined in the 
Sequence Listing as SEQ ID NO: 9. This nucleic acid encodes a 
fusion of LF residues 1-254 with the two residue linker and 
5 PE residues 362-613. (SEQ ID NO: 10) This fusion protein is 
further described in Example 1. 

Alternatively, the nucleic acid can include the 
entire coding sequence for the LF protein fused to a non-LF 
activity inducing domain. Other LF fusion proteins of various 

10 sizes and methods of making and testing them for the desired 
activity are also provided herein, particularly in Examples 1 
and 2. 

Protective Antigen (PA) 

Also provided is an isolated nucleic acid encoding a 

15 fusion protein comprising a nucleotide sequence encoding the 
translocation domain and LF binding domain of the native PA 
protein and a nucleotide sequence encoding a ligand domain 
which specifically binds a cellular target. 

An example of a nucleic acid of this invention 

20 comprises the nucleotide sequence defined in the Sequence 

Listing as SEQ ID NO: 11. This nucleic acid encodes a fusion 
of PA residues 1-725 and human CD4 residues 1-178, the portion 
which binds to gpl20 exposed on HIV-1 infected cells (SEQ ID 
NO: 12) . This fusion protein and methods for obtaining and 

25 testing fusion proteins are further described below and in 
Examples 3 , 4 and 5 . 

The PA fusion protein encoding nucleic acid provided 
can encode any ligand domain that specifically binds a 
cellular target, e.g. a cell surface receptor, an antigen 

30 expressed on the cell surface, etc. For example, the nucleic 
acid can encode a ligand domain that specifically binds to an 
HIV protein expressed on the surface of an HIV-infected cell. 
Such a ligand domain can be a single chain antibody which is 
expressed as a fusion protein as provided above and in 

35 Examples 3, 4 and 5. Alternatively, the nucleic acid can 

encode, for example, a ligand domain that is a growth factor, 
as provided in Example 3. 
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Although the PA encoding sequence of the nucleic 
acid encoding the PA fusion proteins of this invention need 
only include the nucleotide sequence encoding the 
translocation domain and LF binding domain of the native PA 
5 protein, the nucleic acid can further comprise the nucleotide 
sequence encoding the remainder of the native PA protein. Any 
sequences to be included beyond those required, can be 
determined based on routine considerations such as ease of 
manipulation of the nucleic acid, ease of expression of the 
10 product in the host, and any effect on translocation/ 
internalization as taught in the examples. 

Proteins 

Proteins encoded by the nucleic acids of the present 

15 invention are also provided. 
LF Fusi on Proteins 

The present invention provides LF fusion proteins 
encoded by the nucleic acids of the invention as described 
above and in the examples. Specifically, fusions of the LF 

20 gene with domains II, lb, and III of PE can be made by 

recombinant methods to produce in- frame translational fusions. 
Recombinant genes (e.g., SEQ ID NOs : 5, 7 and 9) were 
expressed in Escherichia coli (E. coli) , and the purified 
proteins were tested for activity on cultured cells as 

25 provided in Examples 1 and 2. Certain fusion proteins are 

efficiently internalized via the PA receptor to the cytosol . 
These examples demonstrate that this system can be used to 
deliver many different polypeptides into targeted cells. 

Although specific examples of these proteins are 

30 provided, given the present teachings regarding the 

preparation of LF fusion proteins, other embodiments having 
other activity inducing domains can be practiced using routine 
skill. 

Using current methods of genetic manipulation, a 
35 variety of other activity inducing moieties (e.g., 

polypeptides) can be translated as fusion proteins with LF 
which in turn can be internalized by cells when administered 
with PA or PA fusion proteins. Fusion proteins generated by 
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this method can be screened for the desired activity using the 
methods set forth in the Examples and by various routine 
procedures. Based on the data presented here, the present 
invention provides a highly effective system for delivery of 
5 an activity inducing moiety into cells. 
PA fusi on proteins 

The present invention provides PA fusion proteins 
encoded by the nucleic acids of the invention. Specifically 
fusions of PA with single chain antibodies and CD4 are 
10 provided. 

Using current methods of genetic manipulation, a 
variety of other ligand domains (e.g., polypeptides) can be 
translated as fusion proteins with PA which in turn can 
specifically target cells and facilitate internalization LF or 

15 LF fusion proteins. Based on the data presented here, the 
present invention provides a highly effective system for 
delivery of an activity inducing moiety into a particular type 
or class of cells. 

Although specific examples of these proteins are 

20 provided, given the present teachings regarding the 

preparation of PA fusion proteins, other embodiments having 
other ligand domains can be practiced using routine skill. 
The fusion proteins generated can be screened for the desired 
specificity and activity utilizing the methods set forth in 

25 the example and by various routine procedures. In any case, 
the PA fusion proteins encoded by the nucleic acids of the 
present invention must be able to specifically bind the 
selected target cell, bind LF or LF fusions or conjugates and 
internalize the LF fusion/ conjugate. 

3 0 Conjugates 

A composition comprising the PA binding domain of 
the native LF protein chemically attached to an activity 
inducing moiety is provided. Such an activity inducing moiety 
is an activity not present on native LF. The composition can 

35 comprise an activity inducing moiety that is, for example, a 
polypeptide, a radioisotope, an antisense nucleic acid or a 
nucleic acid encoding a desired gene product. 
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Using current methods of chemical manipulation, a 
variety of other moieties (e.g., polypeptides, nucleic acids, 
radioisotopes, etc.) can be chemically attached to LF and can 
be internalized into cells and can express their activity when 
5 administered with PA or PA fusion proteins. The compounds can 
be tested for the desired activity and internalization 
following the methods set forth in the Examples. For example, 
the present invention provides an LF protein fragment 1-254 
(LFl-254) with a cysteine residue added at the end of LFl-254 

10 (LFl-254Cys) . Since there are no other cysteines in LF, this 
single cysteine provides a convenient attachment point through 
which to chemically conjugate other proteins or non-protein 
moieties . 
Vectors and Hosts 

15 A vector comprising the nucleic acids of the present 

invention is also provided. The vectors of the invention can 
be in a host capable of expressing the protein encoded by the 
nucleic acid. 

To express the proteins and conjugates of the 

20 present invention, the nucleic acids can be operably linked to 
signals that direct gene expression. A nucleic acid is 
"operably linked" when it is placed into a functional 
relationship with another nucleic acid sequence. For 
instance, a promoter or enhancer is operably linked to a 

25 coding sequence if it affects the transcription of the 

sequence. Generally, operably linked means that the nucleic 
acid sequences being linked are contiguous and, where 
necessary to join two protein coding regions, contiguous and 
in reading frame. 

30 The gene encoding a protein of the invention can be 

inserted into an "expression vector", "cloning vector", or 
"vector, " terms which usually refer to plasmids or other 
nucleic acid molecules that are able to replicate in a chosen 
host cell. Expression vectors can replicate autonomously, or 

35 they can replicate by being inserted into the genome of the 
host cell. Vectors that replicate autonomously will have an 
origin of replication or autonomous replicating sequence (ARS) 
that is functional in the chosen host cell(s). Often, it is 
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desirable for a vector to be usable in more than one host 
cell, e.g. , in E. coli for cloning and construction, and in a 
mammalian cell for expression. 

The particular vector used to transport the genetic 
5 information into the cell is also not particularly critical. 
Any of the conventional vectors used for expression of 
recombinant proteins in prokaryotic or eukaryotic cells can be 
used. 

The expression vectors typically have a 

10 transcription unit or expression cassette that contains all 

the elements required for the expression of the DNA encoding a 
protein of the invention in the host cells. A typical 
expression cassette contains a promoter operably linked to the 
DNA sequence encoding the protein, and signals required for 

15 efficient polyadenylation of the transcript. The promoter is 
preferably positioned about the same distance from the 
heterologous transcription start site as it is from the 
transcription start site in its natural setting. As is known 
in the art, however, some variation in this distance can be 

20 accommodated without loss of promoter function. 

The DNA sequence encoding the protein of the 
invention can be linked to a cleavable signal peptide sequence 
to promote secretion of the encoded protein by the transformed 
cell. Additional elements of the vector can include, for 

25 example, selectable markers and enhancers. Selectable 
markers, e.g., tetracycline resistance or hygromycin 
resistance, permit detection and/or selection of those cells 
transformed with the desired DNA sequences (see, e.g., U.S. 
Patent 4,704,362) . 

30 Enhancer elements can stimulate transcription up to 

1,000 fold from linked homologous or heterologous promoters. 
Many enhancer elements derived from viruses have a broad host 
range and are active in a variety of tissues. For example, 
the SV40 early gene enhancer is suitable for many cell types. 

3 5 Other enhancer/promoter combinations that are suitable for the 
present invention include those derived from polyoma virus, 
human or murine cytomegalovirus, the long terminal repeat from 
various retroviruses such as murine leukemia virus, murine or 
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Rous sarcoma virus, and HIV. See, Enhancers and Eukaryotic 
Expression, Cold Spring Harbor Pres, Cold Spring Harbor, N.Y. 
19 83, which is incorporated herein by reference. 

In addition to a promoter sequence, the expression 
5 cassette should also contain a transcription termination 
region downstream of the structural gene to provide for 
efficient termination. The termination region can be obtained 
from the same gene as the promoter sequence or can be obtained 
from a different gene. 

10 For more efficient translation in mammalian cells of 

the mRNA encoded by the structural gene, polyadenylation 
sequences are also commonly added to the vector construct . 
Two distinct sequence elements are required for accurate and 
efficient polyadenylation: GU or U rich sequences located 

15 downstream from the polyadenylation site and a highly 

conserved sequence of six nucleotides, AAUAAA, located 11-30 
nucleotides upstream. Termination and polyadenylation signals 
that are suitable for the present invention include those 
derived from SV40, or a partial genomic copy of a gene already 

20 resident on the expression vector. 

The vectors containing the gene encoding the protein 
of the invention are transformed into host cells for 
expression. "Transformation" refers to the introduction of 
vectors containing the nucleic acids of interest directly into 

25 host cells by well known methods. The particular procedure 

used to introduce the genetic material into the host cell for 
expression of the protein is not particularly critical. Any 
of the well known procedures for introducing foreign 
nucleotide sequences into host cells can be used. It is only 

3 0 necessary that the particular procedure utilized be capable of 
successfully introducing at least one gene into the host cell 
which is capable of expressing the gene. 

Transformation methods, which vary depending on the 
type of host cell, include electroporation; transfection 

35 employing calcium chloride, rubidium chloride calcium 

phosphate, DEAE-dextran, or other substances; microprojectile 
bombardment; lipofection; infection (where the vector is an 
infectious agent); and other methods. See, generally, 
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Sambrook et al. , (19 89) supra, and Current Protocols in 
Molecular Biology, supra. Reference to cells into which the 
nucleic acids described above have been introduced is meant to 
also include the progeny of such cells. 
5 There are numerous prokaryotic expression systems 

known to one of ordinary skill in the art useful for the 
expression of the antigen. E. coli is commonly used, and 
other microbial hosts suitable for use include bacilli, such 
as Bacillus subtilus, and other enterobacteriaceae, such as 

10 Salmonella, Serratia, and various Pseudomonas species. One 
can make expression vectors for use in these prokaryotic 
hosts; the vectors will typically contain expression control 
sequences compatible with the host cell (e.g., an origin of 
replication, a promoter) . Any number of a variety of well- 

15 known promoters can be used, such as the lactose promoter 

system, a tryptophan (Trp) promoter system, a beta- lactamase 
promoter system, or a promoter from phage lambda. The 
promoters will typically control expression, optionally with 
an operator sequence, and have ribosome binding site 

20 sequences, for example, for initiating and completing 
transcription and translation. If necessary, an amino 
terminal methionine can be provided by insertion of a Met 
codon 5 1 and in-frame with the codons for the protein. Also, 
the carboxy- terminal end of the protein can be removed using 

25 standard oligonucleotide mutagenesis procedures, if desired. 

Host bacterial cells can be chosen that are mutated 
to be reduced in or free of proteases, so that the proteins 
produced are not degraded. For Bacillus expression systems in 
which the proteins are secreted into the culture medium, 

30 strains are available that are deficient in secreted 
proteases . 

Mammalian cell lines can also be used as host cells 
for the expression of polypeptides of the invention. 
Propagation of mammalian cells in culture is per se well 
35 known. See, Tissue Culture, Academic Press, Kruse and 

Patterson, ed. (1973) . Host cell lines may also include such 
organisms as bacteria (e.g., E. coli or B. subtil i s) , yeast, 
filamentous fungi, plant cells, or insect cells, among others. 
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Purification of Protein 

After standard transfection or transformation 
methods are used to produce prokaryotic, mammalian, yeast, or 
insect cell lines that express large quantities of the protein 
5 of the invention, the protein is then purified using standard 
techniques which are known in the art. See, e.g., Colley et 
al. (1989) J. Biol. Chem. 64: 17619-17622; and Methods in 
Enzymology, "Guide to Protein Purification", M. Deutscher, 
ed. Vol. 182 (1990) . 

10 Standard procedures of the art that can be used to 

purify proteins of the invention include ammonium sulfate 
precipitation, affinity and fraction column chromatography, 
gel electrophoresis and the like. See, generally, Scopes, R. , 
Protein Purification, Springer- Verlag, New York (1982), and 

15 U.S. Pat. No. 4,512,922 disclosing general methods for 

purifying protein from recombinantly engineered bacteria. 

If the expression system causes the protein of the 
invention to be secreted from the cells, the recombinant cells 
are grown and the protein is expressed, after which the 

2 0 culture medium is harvested for purification of the secreted 

protein. The medium is typically clarified by centrifugation 
or filtration to remove cells and cell debris and the proteins 
can be concentrated by adsorption to any suitable resin such 
as, for example, CDP-Sepharose, asialoprothrombin-Sepharose 
25 4B, or Q Sepharose, or by use of ammonium sulfate 

fractionation, polyethylene glycol precipitation, or by 
ultrafiltration. Other means known in the art are equally 
suitable. Further purification of the protein can be 
accomplished by standard techniques, for example, affinity 

3 0 chromatography, ion exchange chromatography, sizing 

chromatography, or other protein purification techniques used 
to obtain homogeneity. The purified proteins are then used to 
produce pharmaceutical compositions, as described below. 

Alternatively, vectors can be employed that express 
3 5 the protein intracellularly, rather than secreting the protein 
from the cells. In these cases, the cells are harvested, 
disrupted, and the protein is purified from the cellular 
extract, e.g., by standard methods. If the cell line has a 
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cell wall, then initial extraction in a low salt buffer may 
allow the protein to pellet with the cell wall fraction. The 
protein can be eluted from the cell wall with high salt 
concentrations and dialyzed. If the cell line glycosolates 
5 the protein, then the purified glycoprotein may be enhanced by 
using a Con A column. Anion exchange columns (MonoQ, 
Pharmacia) and gel filtration columns may be used to further 
purify the protein. A highly purified preparation can be 
achieved at the expense of activity by denaturing preparative 

10 polyacrylamide gel electrophoresis. 

Protein analogs can be produced in multiple 
conformational forms which are detectable under nonreducing 
chromatographic conditions. Removal of those species having a 
low specific activity is desirable and is achieved by a 

15 variety of chromatographic techniques including anion exchange 
or size exclusion chromatography. 

Recombinant analogs can be concentrated by pressure 
dialysis and buffer exchanged directly into volatile buffers 
(e.g., N-ethylmorpholine (NEM) , ammonium bicarbonate, ammonium 

20 acetate, and pyridine acetate) . In addition, samples can be 
directly freeze-dried from such volatile buffers resulting in 
a stable protein powder devoid of salt and detergents. In 
addition, freeze-dried samples of recombinant analogs can be 
efficiently resolubilized before use in buffers compatible 

25 with infusion (e.g., phosphate buffered saline). Other 

suitable buffers might include hydrochloride, hydrobromide , 
sulphate acetate, benzoate, malate, citrate, glycine, 
glutamate, and aspartate. 

30 Specific Embodiments 

Toxins Modified to Contain Intracellular Pathogen Protease 
Recognition sites 

One aspect of the invention exploits the fact that 
PA and other toxins must be proteolytically cleaved in order 

35 to acquire activity, in conjunction with the fact that some 
cells infected with an intracellular pathogen possess an 
active protease that has a relatively narrow substrate 
specificity (for example, HIV-infected cells) . The protease 
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site found in the native toxin is replaced with an 
intracellular pathogen specific protease site. Thus, the 
protease in cells that are infected by the intracellular 
pathogen cleaves the modified toxin, which then becomes active 
5 and kills the cell. 

Intracellular pathogens that can be targeted by the 
products and methods of the present invention include any 
pathogen that produces a protease having a specific 
recognition site. Such pathogens can include prokaryotes 

10 (including rickettsia, Mycobacterium tuberculosis, etc.), 
mycoplasma, eukaryotic pathogens (e.g. pathogenic fungi, 
etc.), and viruses. One example of an intracellular pathogen 
that produces a specific protease is human immunodeficiency 
virus (HIV) . The HIV-l protease cleaves viral polyproteins to 

15 generate functional structural proteins as well as the reverse 
transcriptase and the protease itself. HIV-l replication and 
viral infectivity are absolutely dependent on the action of 
the HIV-l protease. 

An intracellular pathogen specific protease site can 

20 be introduced into any natural or recombinant toxin for which 
proteolytic cleavage is required for toxicity. For example, 
one can replace the anthrax PA trypsin cleavage site (R164- 
167) of PA with the HIV-l protease site. Alternatively, the 
diphtheria toxin disulfide loop sequence (see O'Hare, et al . 

25 FEBS 273 (1, 2): 200-204 (Oct. 1990)) can be replaced with the 
HIV-l protease cleavage site in order to obtain a toxin 
specific to HIV-l infected cells. Similarly, the normally 
occurring diphtheria toxin sequence at residues 191-194 
(Williams, et al . J. Biol. Chem. 265(33): 20673-20677 (1990)) 

30 can be replaced by an intracellular pathogen specific protease 
site such as the HIV-l protease cleavage sequence. The 
DAB486-IL-2 fusion toxin of Williams and the improved DAB389- 
IL-2 toxin are effective on HIV-l infected cells, which 
express high levels of the IL-2 receptor. Williams, J. Biol. 

35 Chem. 265:20673. Addition of the HIV-l protease cleavage site 
would provide a further degree of specificity. Similarly, the 
botulinum toxin C2 toxin is like the anthrax toxin in 
requiring a cleavage within a native protein subunit (see 
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Ohishi and Yanagimoto, Infection and Immunity 60(11): 4648- 
4655 (Nov. 1992)), so it too can be made specific for cells 
infected by an intracellular pathogen such as HIV-1. 

In one embodiment of the invention, the protease 
5 site of PA is replaced by the site recognized by the HIV-1 
protease. The cellular protease that cleaves PA absolutely 
requires the presence of the Arg 164 and Arg 167 residues, 
because replacement of either residue yields a PA molecule 
which is not cleaved after binding to the cell surface. 

10 However, any PA substitution mutant which retains at least one 
Arg or Lys residue within residues 164-167 can be activated by 
treatment with trypsin. Because the PA63 fragments produced 
by trypsin digestion have a variety of different amino 
terminal residues, it is clear that there is not a strict 

15 constraint on the identity of the terminal residues. Klimpel, 
et al., Proc. Natl. Acad. Sci. 89:10277-10281 (1992). 

Replacement of residues 164-167 of PA with residues 
that match the HIV-1 protease recognition site can render 
exogenously added PA inactive on cells which do not possess 

20 the HIV-1 protease. However, those cells that do express the 
HIV-l protease (i.e., cells infected with HIV-1 or cells 
engineered to produce the protease) would cleave and thereby 
activate the mutant PA. The activated PA proteins can then 
bind and internalize cytotoxic fusion proteins, such as LF-PE, 

25 added exogenously. 

Based on extensive studies of the substrate 
specificity of the protease, several PA variants were designed 
and produced which relate to the invention. These are shown 
below, with the residues underlined between which the cleavage 

30 occurred. PA proteins which have been mutated to replace 

R164-167 with an amino acid sequence recognized by the HIV-1 

protease are referred to as "PAHIV. " 

PAHIV#1 QVSQNYPIVQNI 

PAHIV#2 NTATIMMQRGNF 

35 PAHIV#3 TVS FNFPQ I TLW 

PAHIV#4 GGSAFNFPIVMGG 

The mutant proteins PAHIV#(l-4) were cleaved correctly by the 
HIV-1 protease. 
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Table 1 shows the amino acids and their corresponding 
abbreviations and symbols. 

Table 1 



A 


Ala 


Alanine 


M 


Met 


Methionine 


C 


Cys 


Cysteine 


N 


Asn 


Asparagine 


D 


Asp 


Aspartic acid 


P 


Pro 


Proline 


E 


Glu 


Glutamic acid 


Q 


Gin 


Glutamine 


F 


Phe 


Phenylalanine 


R 


Arg 


Arginine 


G 


Gly 


Glycine 


S 


Ser 


Serine 


H 


His 


Histidine 


T 


Thr 


Threonine 


I 


He 


Isoleucine 


V 


Val 


Valine 


K 


Lys 


Lysine 


W 


Trp 


Tryptophan 


L 


Leu 


Leucine 


Y 


Tyr 


Tyrosine 



15 Preferably, the mutations at R164-167 of PA are 

accomplished by cassette mutagenesis, although other methods 
are feasible as discussed below. In summary, three pieces of 
DNA are joined together. The first piece has vector sequences 
and encodes the "front half" (5' end of the gene) of PA 

20 protein, the second is a short piece of DNA (a cassette) and 
encodes a small middle piece of PA protein and the third 
encodes the "back half" (3' end of the gene) of PA. The 
cassette contains codons for the amino acids that are required 
to complete the cleavage site for the intracellular pathogen 

25 protease. This method was used to make mutants in the plasmid 
pYS5 although other plasmids could be employed. 

Alternatively, the mutations can be accomplished by 
use of the polymerase chain reaction (PCR) and other methods 
as discussed below. PCR duplicates a segment of DNA many 

30 times, resulting in an amplification of that segment. The 

reaction produces enough of the segment of DNA so that it can 
be modified with restriction enzymes and cloned. During the 
reaction a synthetic oligonucleotide primer is used to start 
the duplication of the target DNA segment. Each synthetic 
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primer can be designed to introduce novel DNA sequences into 
the DNA molecule, or to change existing DNA sequences. 



Modification of Toxins to Broaden or Alter Target Cell 
5 Specificity 

Another aspect of the invention involves compounds and 
methods for broadening or changing the range of cell types 
against which a toxin is effective. For example, the lethal 
anthrax toxin, PA+LF, is acutely toxic to mouse macrophage 

10 cells, apparently due to the specific expression in these 
cells of a target for the catalytic activity of LF. Other 
cell types are not affected by LF. However, in the present 
invention, LF is used to construct cytotoxins having broad 
cell specificity. 

15 a detailed analysis of the domains of LF identified 

the amino- terminal 254 amino acids as the region that binds to 
PA63. Fusion proteins containing residues 1-254 of LF and the 
ADP-ribosylation domain of Pseudomonas exotoxin A (PE) were 
designed according to the invention. These fusion proteins 

20 are highly toxic to cultured cells, but only when PA is 
administered simultaneously. 

Synthesis of Genes that Encode Proteins of the Invention 
Genes that encode toxins having altered protease 
recognition sites or fusion proteins having a binding domain 

25 from one protein and an activity inducing domain of a second 
protein can be synthesized by methods known to those skilled 
in the art. As an example of techniques that can be utilized, 
the synthesis of genes encoding modified anthrax toxin 
subunits LF and PA are now described. 

30 The DNA sequences for native PA and LF are known. 

Knowledge of these DNA sequences facilitates the preparation 
of genes and can be used as a starting point to construct DNA 
molecules that encode mutants of PA and/or LF. The protein 
mutants of the invention are soluble and include internal 

35 amino acid substitutions. Furthermore, these mutants are 
purified from, or secreted from, cells that have been 
transfected or transformed with plasmids containing genes 
which encode these proteins. Methods for making 
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modifications, such as amino acid substitutions, deletions, or 
the addition of signal sequences to cloned genes are known. 
Specific methods used herein are described below. 

The gene for PA or LF can be prepared by several 
5 methods. Genomic and cDNA libraries are commercially 

available. Oligonucleotide probes, specific to the desired 
gene, can be synthesized using the known gene sequence. 
Methods for screening genomic and cDNA libraries with 
oligonucleotide probes are known. A genomic or cDNA clone can 
10 provide the necessary starting material to construct an 
expression plasmid for the desired protein using known 
methods . 

A protein encoding DNA fragment can be cloned by 
taking advantage of restriction endonuclease sites which have 

15 been identified in regions which flank or are internal to the 
gene. See Sambrook, et al., Molecular Cloning: A Laboratory- 
Manual 2d.ed. Cold Spring Harbor Laboratory Press (1989), 
"Sambrook" hereinafter. 

Genes encoding the desired protein can be made from 

20 wild- type genes constructed using the gene encoding the full 
length protein. One method for producing wild- type genes for 
subsequent mutation combines the use of synthetic 
oligonucleotide primers with polymerase extension on a mRNA or 
DNA template. This PCR method amplifies the desired 

25 nucleotide sequence. U.S. Patents 4,683,195 and 4,683,202 

describe this method. Restriction endonuclease sites can be 
incorporated into the primers. Genes amplified by PCR can be 
purified from agarose gels and cloned into an appropriate 
vector. Alterations in the natural gene sequence can be 

30 introduced by techniques such as in vitro mutagenesis and PCR 
using primers that have been designed to incorporate 
appropriate mutations. 

The proteins described herein can be expressed 
intracellularly and purified, or can be secreted when 

35 expressed in cell culture. If desired, secretion can be 

obtained by the use of the native signal sequence of the gene. 
Alternatively, genes encoding the proteins of the invention 
can be ligated in proper reading frame to a signal sequence 
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other than that corresponding to the native gene. Though the 
PA recombinant proteins of the invention are typically 
expressed in B. anthracis , they can be expressed in other 
hosts, such as E. coli. 
5 The proteins of this invention are described by their 

amino acid sequences and by their nucleotide sequence, it 
being understood that the proteins include their biological 
equivalents such that this invention includes minor or 
inadvertent substitutions and deletions of amino acids that 

10 have substantially little impact on the biological properties 
of the analogs. In some circumstances it may be feasible to 
substitute rare or non-naturally occurring amino acids for one 
or more of the twenty common amino acids listed in Table 2 . 
Examples include ornithine and acetylated or hydroxylated 

15 forms. See generally Stryer, L. , Biochemistry 3d ed. (1988). 

Alternative nucleotide sequences can be used to 
express analogs in various host cells. Furthermore, due to 
the degeneracy of the genetic code, equivalent codons can be 
substituted to encode the same polypeptide sequence. 

20 Additionally, sequences (nucleotide and amino acid) with 
substantial identity to those of the invention are also 
included. Identity in this sense means the same identity (of 
base pair or amino acid) and order (of base pairs or amino 
acids) . Substantial identity includes entities that are 

25 greater than 80% identical. Preferably, substantial identity 
refers to greater than 90% identity. More preferably, it 
refers to greater than 95% identity. 

Mutagenesis 

30 Mutagenesis can be performed to yield point mutations, 

deletions, or insertions to alter the specific regions of the 
genes described above. Point mutations can be introduced by a 
variety of methods including chemical mutagenesis, mutagenic 
copying methods and site specific mutagenesis methods using 

35 synthetic oligonucleotides. 

Cassette mutagenesis methods are conveniently used to 
introduce point mutations into the specified regions of the PA 
or LF genes. A double -stranded oligonucleotide region 



WO 94/18332 



PCT/US94/01624 



23 

containing alterations in the specified sequences of the gene 
is prepared. This oligonucleotide cassette region can be 
prepared by synthesizing an oligonucleotide with the sequence 
alteration in residues of the PA or LF gene, annealing to a 
5 primer, elongating with the large fragment of DNA polymerase 

and trimming with BstBI. This double- stranded oligonucleotide 
is ligated into the Bamhi/BstBI fragment from pYS5 and the 
PpuMI-BamHI fragment from pYS6 to produce an intact 
recombinant DNA. Other methods of producing the double 

10 stranded oligonucleotides and other recombinant DNA vectors 
can be practiced. 

Chemical mutagenesis can be performed using the Ml 3 
vector system. A single strand M13 recombinant DNA is 
prepared containing recombinant PA or LF DNA. Another M13 

15 recombinant containing the same recombinant DNA but in double 
stranded form is used to prepare a deletion in the targeted 
region of the gene. This double stranded Ml 3 recombinant is 
cleaved into a linear molecule with an endonuclease , 
denatured, and annealed with the single strand Ml 3 

20 recombinant, resulting in a single strand gap in the target 
region of the PA or LF DNA. 

This gapped DNA Ml 3 recombinant is then treated with a 
compound such as sodium bisulfite to deaminate the cytosine 
residues in the single strand DNA region to uracil. This 

25 results in limited and specific mutations in the single strand 
DNA region. Finally, the gap in the DNA is filled in by 
incubation with DNA polymerase, resulting in a U-A base pair 
to replace a G-C base pair in the in unmutated portion of the 
gene. Upon replication the new recombinant gene contains T-A 

30 base pairs, which are point mutations from the original 
sequence. Other forms of chemical mutagenesis are also 
available. 

Mutagenic copying of the PA or LF recombinant DNA can 
be carried out using several methods. For example, a single - 
35 stranded gapped DNA region is created as described above. 

This region is incubated with DNA polymerase I and one or more 
mutagenic analogs of normal ribonucleoside triphosphates. 
Copying of the single stranded region with the DNA polymerase 
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substitutes the mutagenic analogs as the single strand gap 
region is filled in. Transfection and replication of the 
resulting DNA results in production of some mutated 
recombinant DNAs for PA, LF, or EF which can then be selected 
5 by cloning. Other mutagenic copying methods can be used. 

Point mutations can be introduced into the specified 
regions of the PA or LF genes by methods using synthetic 
oligonucleotides for site-specific mutagenesis. PCR copying 
of the PA or LF genes is performed using oligonucleotide 

10 primers covering the specified target regions, and which 

contain modifications from the wild type sequence in these 
regions. The PA gene in a pYS5 vector can be PCR amplified 
using this method to result in mutations in the 164-167 
position. PCR amplification can also be used to introduce 

15 mutations in the target region of the LF gene. 

Synthetic oligonucleotide methods of introducing point 
mutations can be preformed using heteroduplex DNA. A Ml 3 
recombinant DNA vector containing the PA or LF gene is 
prepared and a single -stranded M13 recombinant is produced. A 

20 single strand oligonucleotide containing an alteration in the 
specified target sequence for the PA or LF gene is annealed to 
the single strand M13 recombinant to produce a mismatched 
sequence. Incubation with DNA polymerase I results in a 
double -stranded M13 recombinant containing base pair 

25 mismatches in the specified region of the gene. This M13 

recombinant is replicated in a host such as B. anthracis or E. 
coli to produce both wild type and mutant M13 recombinants . 
The mutated M13 recombinants are cloned and isolated. Other 
vector systems for mutagenesis involving synthetic nucleotides 

30 and heteroduplex formation can be applicable. 

Expression of Proteins in Prokaryotic Cells 

In addition to the use of cloning methods in bacteria 
such as Bacillus anthracis for amplification of cloned 
35 sequences, it may be desirable to express the proteins in 
other prokaryotes. It is possible to recover a functional 
protein from E. coli transformed with an expression plasmid 
encoding a PA or LF protein. Conveniently, the mutated PA 



WO 94/18332 



PCT/US94/01624 



25 

proteins of the invention were expressed in B . anthracis and 
the LF- fusion proteins were expressed in E. coli. 

Methods for the expression of cloned genes in bacteria 
are well known. See Sambrook. To optimize expression of a 
5 cloned gene in a prokaryotic system, expression vectors can be 
constructed which include a promoter to direct mRNA 
transcription termination. The inclusion of selection markers 
in DNA vectors transformed in bacteria are useful. Examples 
of such markers include the genes specifying resistance to 

10 ampicillin, tetracycline, or chloramphenicol. 

See Sambrook, previously cited, for details concerning 
selection markers and promoters for use in bacteria such as 
E. coli. In an embodiment of this invention, pYS5 is a vector 
for the subcloning and amplification of desired gene sequences 

15 although other vectors could be used. 

Strains of Bacillus anthracis producing mutated protein (s) 

For PA protein production, B. anthracis strains cured 
of both pXOl and pX02 are preferred because they are 

20 avirulent. Examples of such strains are UM23C1-1 and 
UM44-1C9, obtained from Curtis Thome, University of 
Massachusetts. Similar strains can be made by curing of 
plasmids, as described by P. Mikesell, et al., "Evidence for 
plasmid-mediated toxin production in Bacillus anthracis, " 

25 Infect. Immun. 39:371-376 (1983). 

See generally commonly assigned U.S. Patent 
Application Serial No. 08/042,745, filed April 5, 1993, 
incorporated by reference herein. 

30 Treatment Methods 

A method for delivering a desired activity to a cell 
is provided. The steps of the method include administering to 
the cell (a) a protein comprising the translocation domain and 
the LF binding domain of the native PA protein and a ligand 

35 domain, and (b) a product comprising the PA binding domain of 
the native LF protein and a non-LF activity inducing moiety, 
whereby the product administered in step (b) is internalized 
into the cell and performs the activity within the cell. 
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The method of delivering an activity to a cell can use 
a ligand domain that is the receptor binding domain of the 
native PA protein. Other ligand domains are selected for 
their specificity for a particular cell type or class of 
5 cells. The specificity of the PA fusion protein for the 

targeted cell can be determined using standard methods and as 
described in Examples 2 and 3 . 

The method of delivering an activity to a cell can use 
an activity inducing moiety that is a polypeptide, for example 

10 a growth factor, a toxin, an antisense nucleic acid, or a 
nucleic acid encoding a desired gene product. The actual 
activity inducing moiety used will be selected based on its 
functional characteristics, e.g. its activity. 

A method of killing a tumor cell in a subject is also 

15 provided. The steps of the method can include administering 
to the subject a first fusion protein comprising the 
translocation domain and LF binding domain of the native PA 
protein and a tumor cell specific ligand domain in an amount 
sufficient to bind to a tumor cell. A second fusion protein 

20 is also administered wherein the protein comprises the PA 

binding domain of the native LF protein and a cytotoxic domain 
of a non-LF protein in an amount sufficient to bind to the 
first protein, whereby the second protein is internalized into 
the tumor cell and kills the tumor cell. 

25 The cytotoxic domain can be a toxin or it can be 

another moiety not strictly defined as a toxin, but which has 
an activity that results in cell death. These cytotoxic 
moieties can be selected using standard tests of cytotoxicity, 
such as the cell lysis and protein synthesis inhibition assays 

30 described in the examples. 

The invention further provides a method of killing 
HIV-infected cells in a subject. The method comprises the 
steps of administering to the subject a first fusion protein 
comprising the translocation domain and LF binding domain of 

35 the native PA protein and a ligand domain that specifically 
binds to an HIV protein expressed on the surface of an HIV- 
infected cell, in an amount sufficient to bind to an HIV- 
infected cell. The next step is administering to the subject 
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a second fusion protein comprising the PA binding domain of 
the native LF protein and a cytotoxic domain of a non-LF 
protein, in an amount sufficient to bind to the first protein, 
whereby the second protein is internalized into the HIV- 
5 infected cell and kills the HIV-infected cell, thereby 
preventing propagation of HIV. 

Although certain of the methods of the invention have 
been described as using LF fusion proteins, it will be 
understood that other LF compositions having chemically 
10 attached activity inducing moieties can be used in the 
methods . 

The fusion proteins and other compositions of the 
inventions can be administered by various methods, e.g., 
parenterally, intramuscularly or intrapertioneally . 

15 The amount necessary can be deduced from other 

receptor/ ligand or antibody/antigen therapies. The amount can 
be optimized by routine procedures. The exact amount of such 
LF and PA compositions required will vary from subject to 
subject, depending on the species, age, weight and general 

20 condition of the subject, the severity of the disease that is 
being treated, the particular fusion protein of composition 
used, its mode of administration, and the like. Generally, 
dosage will approximate that which is typical for the 
administration of cell surface receptor ligands, and will 

25 preferably be in the range of about 2 /ig/kg/day to 2 
mg/kg/day . 

Depending on the intended mode of administration, the 
compounds of the present invention can be in various 
pharmaceutical compositions. The compositions will include, 

30 as noted above, an effective amount of the selected protein in 
combination with a pharmaceutically acceptable carrier and, in 
addition, can include other medicinal agents, pharmaceutical 
agents, carriers, adjuvants, diluents, etc. By 
"pharmaceutically acceptable" is meant a material that is not 

35 biologically or otherwise undesirable, i.e., the material can 
be administered to an individual along with the fusion protein 
or other composition without causing any undesirable 
biological effects or interacting in a deleterious manner with 
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any of the other components of the pharmaceutical composition 
in which it is contained. 

Parenteral administration, if used, is generally- 
characterized by injection. Injectables can be prepared in 
5 conventional forms, either as liquid solutions or suspensions, 
solid forms suitable for solution or suspension in liquid 
prior to injection, or as emulsions. A more recently revised 
approach for parenteral administration involves use of a slow 
release or sustained release system, such that a constant 
10 level of dosage is maintained. See, e.g., U.S. Patent No. 
3,710,795, which is incorporated by reference herein. 

Formulations and Administration 

Proteins of the invention such as PAHIV are typically 

15 mixed with a physiologically acceptable fluid prior to 

administration to a mammal such as a human. Examples of 
physiologically acceptable fluids include saline solutions 
such as normal saline, Ringer's solution, and generally 
mixtures of various salts including potassium and phosphate 

20 salts with or without sugar additives such as glucose. The 
proteins are administered parenterally with intravenous 
administration being the most typical route. Either a bolus 
of the protein in solution or a slow infusion can be 
administered intravenously. The choice of a bolus or an 

25 infusion depends on the kinetics, including the half -life, of 
the protein in the patient. An appropriate evaluation of the 
time for delivery of the protein is well within the skill of 
the clinician. 

Patients selected for treatment with PAHIV are 

3 0 infected with HIV-1 and they may or may not be symptomatic. 
Optimally, the protein would be administered to an HIV-l 
infected person who is not yet symptomatic. The dosage range 
of a protein of the invention such as PAHIV is typically from 
about 5 to about 25 micrograms per kilogram of body weight of 

35 the patient. Usually, the dose is about 10 micrograms per 
kilogram of body weight of the patient. The dosage is 
repeated at regular intervals, such as weekly for about 4 to 6 
weeks. At that time the clinician may opt to evaluate the 
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patient's immune status, including immuno- tolerance to the 
PAHIV, to decide future treatment. 

The foregoing description and the following examples 
are offered primarily for purposes of illustration. It will 
5 be readily apparent to those skilled in the art that the 

operating conditions, materials, procedural steps and other 
parameters of the system described herein can be further 
modified or substituted in various ways without departing from 
the spirit and scope of the invention. For example, although 

10 human use has been discussed, veterinary use of the invention 
is also feasible. For instance, cats suffer from a so-called 
feline AIDS or feline immunodeficiency virus (FIV) . 
Protective antigen can be altered to include a protease 
cleavage site specific for FIV. Thus, the invention is not 

15 limited by the description and examples, but rather by the 
appended claims. 

EXAMPLE 1 

Fusions of Anthrax Toxin Let hal Factor to the 

20 ADP-Ribosylation Domain of Pseudomonas Exotoxin 

Reagents and General Procedures 

Restriction endonucleases and DNA modifying enzymes 
were purchased from GIBCO/BRL, Boehringer Mannheim, or New 
England Biolabs . Low melting point agarose (Sea Plaque) was 

25 obtained from FMC Corp. (Rockland, ME) . Oligonucleotides were 
synthesized on a PCR Mate (Applied Biosystems) and purified on 
oligonucleotide purification cartridges (Applied Biosystems) . 
The PCR was performed with a DNA amplification reagent 
(GeneAmp) from Perkin- Elmer Cetus Instruments and a thermal 

30 cycler (Perkin- Elmer Cetus) . The amplification involved 

denaturation at 94°C for 1 min, annealing at 55°C for 2.5 min 
and extension at 72 °C for 3 min, for 3 0 cycles. A final 
extension was run at 72 °C for 7 min. For amplification of PE 
fragments, 10% formamide was added in the reaction mixture to 

35 decrease the effect of high GC content. DNA sequencing 

reactions were done using the Sequenase version 1.0 from U. S. 
Biochemical Corp. and DNA sequencing gels were made from Gel 
Mix 6 from GIBCO/BRL. [ 35 S] deoxyadenosine 5' -[a- 
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thio] triphosphate and L- [3 , 4 , 5- 3 H] leucine were purchased from 
Dupont-New England Nuclear. J774A.1 cells were obtained from 
American Type Culture Collection. Chinese Hamster Ovary (CHO) 
cells were obtained from Michael Gottesman (National Cancer 
5 Institute, National Institutes of Health) (ATCC CCL 61) . 
Plasmid Construction 

Construction of plasmids containing LF-PE fusions was 
performed as follows. Varying portions of the PE gene were 
amplified by PCR, ligated in frame to the 3 'end of the LF 

10 gene, and inserted into the pVEX115 f+T expression vector 
(provided by V. K. Chaudhary, National Cancer Institute, 
National Institutes of Health) . To construct fusion proteins, 
the 3 '-end of the native LF gene (including codon 776 of the 
mature protein, specifying Ser) was ligated with the 5' -ends 

15 of sequences specifying varying portions of domains II, lb, 
and III of PE. The LF gene was amplified from the plasmid 
pLF7 (Robertson, D. L. and Leppla, S.H. Gene 44:71-78, 1986) 
by PCR using oligonucleotide primers which added Kpnl and Mlul 
sites at the 5' and the 3' ends of the gene, respectively. 

20 Similarly, varying portions of the PE gene (provided by David 
FitzGerald, National Cancer Institute, National Institutes of 
Health) were amplified by PCR so as to add Mlul and EcoRI 
sites at the 5' and 3' ends. The PCR product of the LF gene 
was digested with Kpnl and the DNA was precipitated. The LF 

25 gene was subsequently treated with Mlul. Similarly, the PCR 
products of PE amplification were digested with Mlul and 
EcoRI. The expression vector pVEX115 f+T was cleaved with 
Kpnl and EcoRI separately and dephosphorylated. This vector 
has a T7 promoter, OmpA signal sequence, multiple cloning 

30 site, and T7 transcription terminator. All the above DNA 
fragments were purified from low-melting point agarose, a 
three - fragment ligation was carried out, and the product 
transformed into E. coli DH5a (ATCC 53868) . The four 
constructs described in this report have the entire LF gene 

35 fused to varying portions of PE. The identity of each 

construct was confirmed by sequencing the junction point using 
a Sequenase kit (U.S. Biochemical Corp.). For expression, 
recombinant plasmids were transformed into E. coll strain 
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SA2821 (provided by Sankar Adhya, National Cancer Institute, 
National Institutes of Health, which is a derivative of 
BL2KXDE3) (Studier, F. W. and Moffatt, B.A. J. Mol . Biol. 
189:113-150, 1986). This strain has the T7 RNA polymerase 
5 gene under control of an inducible lac promotor and also 
contains the degP mutation, which eliminates a major 
periplasmic protease (Strauch et al. J. Bacterid. 171:2689- 
2696, 1989) . 

In the resulting plasmids, the LF-PE fusion genes are 

10 under control of the T7 promoter and contain an OmpA signal 
peptide to obtain secretion of the products to the periplasm 
so as to facilitate purification. The design of the PCR 
linkers also led to insertion of two non-native amino acids, 
Thr-Arg, at the LF-PE junction. The four fusions analyzed in 

15 this report contain the entire 776 amino acids of mature LF, 
the two added residues TR (Thr-Arg) , and varying portions of 
PE. In fusion FP33, the carboxyl- terminal end of PE was 
changed from the native REDLK (Arg-Glu-Asp-Leu-Lys) to LDER, a 
sequence that fails to cause retention in the ER (endoplasmic 

2 0 reticulum) . 

Expression and Purification of Fusion Proteins 

Fusion proteins produced from pNA2, pNA4 , pNA23 and 
pNA33 were designated FP2, FP4, FP23 and FP33 respectively. 
E. coli strains carrying the recombinant plasmids were grown 

25 in super broth (32 g/L Tryptone, 20 g/L yeast extract, 5 g/L 
NaCl, pH 7.5) with 100 /*g/ml of ampicillin with shaking at 
225 rpm at 37°C in 2-L cultures. When A 600 reached 0.8-1.0, 
isopropyl-1- thio-0-D-galactopyranoside was added to a final 
concentration of 1 mM, and cultures were incubated an 

30 additional 2 hr. EDTA and 1 , 10-o-phenanthroline were added to 
5 mM and 0.1 mM respectively, and the bacteria were harvested 
by centrifugation at 4000 x g for 15 min at 4°C. For 
extraction of the periplasmic contents, cells were suspended 
in 75 ml of 20% sucrose containing 30 mM Tris and 1 mM EDTA, 

35 incubated at 0° for 10 min, and centrifuged at 8000 x g for 

15 min at 4°C. Cells were resuspended gently in 5 0 ml of cold 
distilled water, kept on ice for 10 min, and the spheroplasts 
were pelleted. The supernatant was concentrated with 



WO 94/18332 



PCT/US94/01624 



32 

Centriprep-100 units (Ami con) and loaded on a Sephacryl S-200 
column (40 x 2 cm) and 1 ml fractions were collected. 

Fractions having full length fusion protein as 
determined by immunoblots were pooled and concentrated as 
5 above. Protein was then purified on an anion exchange column 
(MonoQ HR5/5, Pharmacia- LKB) using a NaCl gradient. The 
fusion proteins eluted at 280-300 mM NaCl . The proteins were 
concentrated again on Centriprep-100 (Amicon Division) and the 
MonoQ chromatography was repeated. Protein concentrations 

10 were determined by the bicinchoninic acid method (BCA Protein 
Assay Reagent, Pierce) , using bovine serum albumin as the 
standard. Proteins were analyzed by polyacrylamide gel 
electrophoresis in the presence of sodium dodecyl sulfate 
(SDS) . Gels were either stained with Coomassie Brilliant Blue 

15 or the proteins were electroblotted to nitrocellulose paper 
which was probed with polyclonal rabbit antisera to LF or PE 
(List Biological Laboratories, Campbell, CA) . To determine 
the percent of full length protein, SDS gels stained with 
Coomassie Brilliant Blue were scanned with a laser 

20 densitometer (Pharmacia -LKB Ultrascan XL) . 

The proteins migrated during gel electrophoresis with 
molecular masses of more than 106 kDa, consistent with the 
expected sizes, and immunoblots confirmed that the products 
had reactivity with antisera to both LF and PE. The fusion 

25 proteins differed in their susceptibility to proteolysis as 
judged by the appearance of smaller fragments on immunoblots, 
and this led to varying yields of final product. Thus, from 
2-L cultures the yields were FP2, 27 jtg; FP4, 87 j*g; FP23, 18 
fxg; and FP33, 143 fig. 

30 Cell Culture Techniques and Protein Synthesis Inhibition Assay 
CHO cells were maintained as monolayers in Eagle's 
minimum essential medium (EMEM) supplemented with 10% fetal 
bovine serum, 10 mM 4-2 (2-hydroxyethyl) -1- 
piperazineethanesulfonic acid (HEPES) (pH 7.3), 2 mM 

35 glutamine, penicillin/streptomycin, and non-essential amino 

acids (GIBCO/BRL) . Cells were plated in 24- or 48 -well dishes 
one day before the experiment. After overnight incubation, 
the medium was replaced with fresh medium containing 1 /xg/ml 
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of PA unless otherwise indicated. Fusion proteins were added 
to 0.1-1000 ng/ml. All data points were done in duplicate. 
Cells were further incubated for 20 hr at 37°C in 5% C0 2 
atmosphere. The medium was then aspirated and cells were 
5 incubated for 2 hr at 37°C with leucine- free medium containing 
1 jiCi/ml [ 3 H] leucine. Cells were washed twice with medium, 
cold 10% trichloroacetic acid was added for 30 min, the cells 
were washed twice with 5% trichloroacetic acid and dissolved 
in 0.150 ml 0.1 M NaOH. Samples were counted in Pharmacia-LKB 

10 1410 liquid scintillation counter. In experiments to 

determine if the toxin is internalized through acidified 
endosomes, 1 fM monensin (Sigma) was added 90 min prior to 
toxin and was present during all subsequent steps. To verify 
that the fusion proteins were internalized through the PA 

15 receptor, competition with native LF was carried out. PA (0.1 
^cg/ml) and LF (0.1-10,000 ng/ml) were added to the CHO cells 
to block the PA receptor and the fusion proteins were added 
thereafter at concentrations of 100 ng/ml for FP4 and FP23 and 
5 ng/ml for FP33 . Protein synthesis inhibition was measured 

20 after 20 hr as described above. 

Cytotoxic Activity of the Fusion Proteins 

All four fusion proteins made and purified were toxic 
to CHO cells. The concentration causing 50% lysis of cultured 
cells (EC 50 ) values of the proteins were 350, 8, 10, and 0.2 

25 ng/ml for FP2, FP4, FP23 and FP33 respectively (Table 1) . 

These assays were done with PA present at 1 ug/ml, exceeding 
the K,,, of 0.1 ug/ml (100 pM) . The fusion proteins had no 
toxicity even at 1 /ng/ml when PA was omitted, proving that 
internalization of the fusion proteins was occurring through 

30 the action of PA and the PA receptor. Native LF has 

previously been shown to have no short-term toxic effects on 
CHO cells when added with PA, and therefore was not included 
in these assays. The fusion protein having only domain III 
and an altered carboxyl- terminus (FP33) was most active, 

35 whereas the one having the intact domains II and III and the 
native REDLK terminus (FP2) was least active. The other two 
fusion proteins (FP4 and FP23) had intermediate potencies. 
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Among proteins having ADP- ribosylation activity, 
potencies equalling or exceeding 1 pM have previously been 
found only for native diphtheria and Pseudomonas toxins acting 
on selected cells (Middlebrook, J. L. and Dorian, R.B. Can. J. 
5 Microbiol. 23:183-189, 1977) and for fusion proteins of PE and 
diphtheria toxin when tested on cells containing > 100,000 
receptors for the ligand-recognition domain of the fusion 
(EGF, transferrin, etc.) (Pastan, I. and FitzGerald, D. 
Science 254:1173-1177, 1991; Middlebrook, et al . 1977). For 

10 CHO cells, the potency of FP33 (EC 50 = 2 pM) is higher than 
that of PE itself (EC 50 = 420 pM) , even though CHO cells 
probably have similar numbers of receptors for both PA and PE 
(approx. 5,000-20,000). If the intracellular trafficking of 
native PE delivers less than 5% of the molecules to the 

15 cytosol, then the 200-fold greater potency of FP33 suggests 
that the PA/LF system has an inherently high efficiency of 
delivery to the cytosol. 

A comparison of the potencies of the four fusion 
proteins shows that inclusion of domain II decreases potency. 

20 Thus, the fusion with the lowest potency, FP2, was the one 

containing intact domains II, lb, and III. In designing the 
fusion proteins, all or part of PE domain II and lb was 
included in several of the constructs because it could not be 
assumed that the translocation functions possessed by PA and 

25 LF would be able to correctly traffic PE domain III to the 

cytosol. The combination of domains II, lb, and III, termed 
PE40, has been used in a large number of toxic hybrid 
proteins, by fusion to growth factors, monoclonal antibodies, 
and other proteins (Pastan et al . 1991; Oeltmann, T. N. and 

30 Frankel, A. E. Faseb J. 5:2334-2337, 1991), and some of these 
fusions have shown substantial potency. Domain II was found 
to be essential in these hybrid proteins to provide a 
translocation function not present in the receptor-binding 
domain to which it was fused. The potency of many of these 

35 PE40 fusion proteins appears to require that they be 

trafficked through the Golgi and ER and proteolytically 
activated in the same manner as native PE, so as to achieve 
delivery of domain III to the cytosol. The fact that 
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inclusion of the entire domain II in the LF fusion protein FP2 
instead decreased activity suggests that internalization of 
the LF fusions occurs through a different route, one that does 
not easily accommodate all the sequences in domain II. 
5 Evidence that structures within PE residues 251-278 

inhibit translocation of the LF fusions comes from the 35 -fold 
lower potency of FP2 compared to FP23. One structure that 
might inhibit translocation of the fusions is the disulfide 
loop formed by Cys265 and Cys287. In native PE, this 

10 disulfide loop appears to be required for maximum activity. 

Thus, native PE and TGF-a-PE40 fusions become 10- to 100-fold 
less toxic if one or both these cysteines are changed to 
serine. The disulfide loop probably acts to constrain the 
polypeptide so that Arg276 and Arg279 are susceptible to the 

15 intracellular protease involved in the cleavage that precedes 
translocation. In contrast, the disulfide loop decreases the 
potency of the LF fusions, perhaps by preventing the unfolding 
needed for passage through a protein channel, thereby acting 
in this situation as a "stop transfer" sequence. FP23, which 

20 lacks Cys265, would not contain the domain II disulfide, and 
therefore would not be subject to this effect. LF, like PA 
and EF, contains no cysteines, and would not be prevented by 
disulfide loops from the complete unfolding needed to pass 
through a protein channel. The suggestion that disulfide 

25 loops act as stop- transfer signals would predict that the 

disulfide Cys372-Cys379 in PE domain lb, which is retained in 
all four LF fusions would also decrease potency. It should be 
noted that neither the fusions made here nor the PE40 fusions 
have been analyzed chemically to determine if the disulfides 

30 in domains II and III are actually formed. If the disulfides 
do form correctly, it would be predicted that the potencies of 
all of the fusion proteins, and especially that of FP2 , would 
be increased by treatment with reducing agents. These 
analyses have not yet been performed. This analysis also 

35 suggests that future LF fusions might be made more potent by 
omission of domain lb. 

The other structural feature of PE known to affect 
intracellular trafficking is the carboxyl terminal sequence, 
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REDLK, that specifies retention in the ER (Chaudhary et al . 
1990; Muro et al . 1987). To determine if the trafficking of 
the LF fusion proteins was similar to that of PE, two of the 
fusion proteins were designed so as to differ only in the 
5 terminal sequence. Replacement of the native sequence by 
LDER, one that does not function as an ER retention signal, 
produced the most toxic of the four fusion proteins, FP33. 
FP4, identical except that it retained a functional REDLK 
sequence, was 30 -fold less potent. These data suggest that 

10 sequestration of the REDLK- ended fusions decreased their 
access to cytosolic EF-2. The implication is that PE may 
require the REDLK terminus to be delivered to the ER for an 
obligatory processing step, but then be limited in its final 
toxic potential by sequestration from its cytosolic target. 

15 Finally, this comparison strongly argues that internalization 
of the LF fusions does not follow the same path as PE. 

In designing the fusion proteins described here it was 
hoped that they would have cytotoxic activity against cells 
that are unaffected by anthrax lethal toxin, and this was 

20 successfully realized as shown by the data obtained with CHO 
cells. However, prior knowledge about LF did not provide a 
basis for predicting whether the constructs would retain 
toxicity toward mouse macrophages, the only cells known to be 
rapidly killed by anthrax lethal toxin. Macrophages are lysed 

25 by lethal toxin in 90-120 minutes, long before any inhibition 
of protein synthesis resulting from ADP-ribosylation of EF-2 
leads to decreases in membrane integrity or viability. This 
kinetic difference made it possible to test directly for LF 
action. As discussed above, the fusion proteins purified to 

30 remove the «= 89-kDa LF species formed by proteolysis were not 
toxic to J774A.1 macrophages. This shows that attachment of a 
bulky group to the carboxyl terminus of LF eliminates its 
normal toxic activity. In the absence of any assay for the 
putative catalytic activity of LF, it is not possible to 

35 determine the cause of the loss of LF activity. The inability 
of the fusions to lyse J774A.1 cells also argues against 
proteolytic degradation of the fusions either in the medium 
during incubation with cells or after internalization. 
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An important result of the invention described here is 
the demonstration that the anthrax toxin proteins constitute 
an efficient mechanism for protein internalization into animal 
cells. The high potency of the present fusion proteins argues 
5 that this system is inherently efficient, as well as being 

amenable to improvement. The high efficiency results in part 
from the apparent direct translocation from the endosome, 
without a requirement for trafficking through other 
intracellular compartments. In addition to its efficiency, 

10 the system appears able to tolerate heterologous polypeptides. 
Macrophage Lysis Assay of Fusion Proteins 

Fusion proteins were assayed for LF functional 
activity on J774A.1 macrophage cell line in the presence of 
1 /xg/ml PA. One day prior to use, cells were scraped from 

15 flasks and plated in 48 -well tissue culture dishes. For 

cytotoxicity tests, the medium was aspirated and replaced with 
fresh medium containing l /xg/ml PA and the LF fusion proteins, 
and the cells were incubated for 3 hr. All data points were 
performed in duplicate. To measure the viability of the 

20 treated cells, 3- [4 , 5-dimethylthiazol-2-yl] -2,5- 

diphenyltetrazolium bromide (MTT) was added to the cells to a 
final concentration of 0.5 mg/ml, and incubation was continued 
for an additional 45 min to allow the uptake and oxidation of 
MTT by viable cells. Medium was aspirated and replaced by 

25 200 pi of 0.5% SDS, 40 mM HC1, 90% isopropanol and the plates 
were vortexed to dissolve the blue pigment. The MTT 
absorption was read at 570 nm using a UVmax Kinetic Microplate 
Reader (Molecular Devices Corp.). 

The crude periplasmic extracts from which the fusion 

30 proteins were purified caused lysis of J774A.1 macrophages 
when added with PA, indicating the presence of active LF 
species, probably formed by proteolysis of the fusion 
proteins. Purification removed this activity, so that none of 
the final fusion proteins had this activity. This result 

35 showed both that the purified proteins were devoid of full 

size LF or active LF fragments, and that the lytic activity of 
LF for macrophages is blocked when residues from PE are fused 
at its carboxyl terminus. 
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ADP-Ribosvl ation Assays 

For assaying ADP-ribosylation activity, the method of 
Collier and Kandel (Collier, R. J. and Kandel, J. J. Biol. 
Chem. 246:1496-1503, 1971) was used with some modification. A 
5 wheat germ extract enriched for EF-2 was used in the reaction. 
Briefly, in a 200-/iL reaction assay, 20 fih of buffer 
(500 mM Tris, 10 mM EDTA, 50 mM dithiothreitol and 
10 mg/ml bovine serum albumin) was mixed with 30 fih of EF-2, 
130 nL of H 2 0 or sample, and 20 fih of [adenylate- 32 P] NAD (0.4 

10 fxCi per assay, ICN Biochemicals) containing 5 jiM of non- 
radioactive NAD. Samples were incubated for 20 min at 23°C, 
the reactions were stopped by adding 1 ml 10% trichloroacetic 
acid, and the precipitates were collected and washed on GA-6 
filters (Gelman Sciences) . The filters were washed twice with 

15 70% ethanol, air dried, and the radioactivity measured. 

Table 1 shows that all the fusion proteins were 
equally capable of ADP-ribosylation of EF-2. FP2, which had 
little cytotoxic activity on CHO cells, still retained full 
ADP-ribosylation activity. It was also found that treatment 

20 with urea and dithiothreitol under conditions that activate 

the enzymatic activity of native PE, caused no increase in the 
ADP-ribosylation activity of the fusion proteins, suggesting 
that the proteins were not folded so as to sterically block 
the catalytic site. 

25 Effect of Mutant PA on LF-PE Activity 

To verify that uptake of the fusion proteins requires 
PA, the activity of the fusion proteins was measured in the 
presence of a mutant PA which is apparently defective in 
internalization. This mutant, PA-S395C, has a serine to 

30 cysteine substitution at residue 395 of the mature protein, 
and retains the ability to bind to receptor, become 
proteolytically nicked, and bind LF, but is unable to lyse 
macrophages. When PA-S395C was substituted for native PA in 
combination with FP33, no inhibition of protein synthesis 

35 inhibition was observed. Similar results were obtained when 
the other three fusion proteins were tested in combination 
with PA-S395C. 
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Effect of Monensin on Activity of the Fusion Proteins 

To verify that internalization of the fusion proteins 
was occurring by passage through acidified endosomes in the 
same manner as native LF, the ability of monensin to protect 
5 cells was examined. Addition of monensin to 1 (M decreased 
the potency of FP33 by >100-fold. Protection against the 
other three fusion proteins exceeded 20 -fold. 
LF Block of LF-PE Fusion Activity 

To further verify that the fusion proteins were 

10 internalized through the PA receptor, CHO cells were incubated 
with PA and different amounts of LF to block the receptor and 
the fusion proteins were added thereafter. Protein synthesis 
inhibition assays showed that native LF could competitively 
block LF-PE fusion proteins in a concentration-dependent 

15 manner . 

The present data suggest that the receptor- bound 63- 
kDa proteolytic fragment of PA forms a membrane channel and 
that regions at or near the amino- termini of LF and EF enter 
this channel first and thereby cross the endosomal membrane, 

20 followed by unfolding and transit of the entire polypeptide to 
the cytosol. This model differs from that for diphtheria 
toxin in that the orientation of polypeptide transfer is 
reversed. Since both EF and LF have large catalytic domains, 
extending to near their carboxyl termini, it appears probable 

25 that the entire polypeptide crosses the membrane. In the LF 
fusion proteins, the attached PE sequences would be carried 
along with the LF polypeptide in transiting the channel to the 
cytosol. Thus, the PA63 protein channel must tolerate diverse 
amino acid residues and sequences . The data presented is 

30 consistent with the mechanism of direct translocation of the 
LF proteins to the cytosol as suggested herein. 
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TABLE 1 Cytotoxic and catalytic activity of LF-PE fusion 
proteins 



Prot 
-ein 


Amino acid content 


Toxicity 
(EC 50 ) b 


ADP- 
Ribosylation 
activity 
(relative) 


LF 


Link 
er 


PE 


(pM) 


ng/ 
ml 


PE 


none 


none 


1-613 


420 


23 


100 c 


FP2 


776 


TR 


251-613 


2700 


350 


82 


FP4 


776 


TR 


362-613 


65 


8 


105 


FP23 


776 


TR 


279-613 


70 


10 


108 


| FP33 


776 


TR 


362-612 a 


2 


0.2 


118 



a REDLK at carboxyl terminus is changed to LDER. 
b Data is from this example, except for native PE, which is 
from data not shown, and is equal to a value previously 
reported (Moehring, T. J. and Moehring, J. M. Cell 11:447-454, 
1977) . 

c ADP-ribosylation was measured using 30 ng of fusion protein 
in a final volume of 0.200 ml with 5 NAD. Results were 
corrected for the molecular weights of the proteins and 
normalized to PE. 
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EXAMPLE 2: Residues 1-254 of Anthrax Toxin Letha l Factor are 
Sufficient to Cause Cellular U ptake of Fused Polypeptides 
Reagents and General Procedures 

Restriction endonucleases and DNA modifying enzymes 
5 were purchased from GIBCO/BRL, Boehringer Mannheim or New 

England Biolabs. Low melting point agarose (Sea Plaque) was 
obtained from FMC Corporation. Oligonucleotides were 
synthesized on a PCR Mate (Applied Biosystems) and purified 
with Oligonucleotide Purification Cartridges (Applied 

10 Biosystems) . Polymerase chain reactions (PCR) were performed 
on a thermal cycler (Perkin- Elmer -Cetus) using reagents from 
U. S. Biochemical Corp. or Perkin-Elmer- Cetus. DNA was 
amplified as described in Example 1. The DNA was sequenced to 
confirmed the accuracy of all of the constructs described in 

15 the report. SEQUENASE version 2.0 from U. S. Biochemical 
Corp. was utilized for the sequencing reactions, and DNA 
sequencing gels were made with Gel Mix 8 from GIBCO/BRL. 
[ 35 S]dATPaS and L- [3,4, 5- 3 H] leucine were purchased from 
Dupont-New England Nuclear. Chinese hamster ovary cells (CHO) 

20 were obtained from Michael Gottesman (NCI, NIH) . J774A.1 
macrophage cells were obtained from American Type Culture 
Collection. 
Plasmid Construction 

Three types of LF protein constructs were made and 

25 analyzed in this report. All the constructs were made by PCR 
amplification of the desired sequences, using the native LF 
gene as template. LF proteins deleted at the amino- or 
carboxyl- terminus were constructed by a single PCR 
amplification reaction that added restriction sites at the 

30 ends for incorporation of the construct into the expression 
vector. LF proteins deleted for one or more of the 19 -amino 
acid repeats that comprise residues 308-383 were constructed 
by ligating the products of two separate PCR reactions that 
amplified the regions bracketing the deletion. The third 

35 group of constructs were fusions of varying portions of the 
amino terminus of LF with PE domains lb and III. Like the 
internally-deleted LF proteins, these LF-PE fusions were also 
made by ligation of two separate PCR products. In the latter 
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two types of constructs, the ligation of the PCR products 
resulted in addition of a linker, ACGCGT, at the junction 
points. This introduced two non-native residues, Thr-Arg, 
between the fused domains. The PCR manipulations also added 
5 three non-native amino acids, Met-Val-Pro, as an extension to 
1 the native amino terminus on all the constructs described in 
this report. Addition of this sequence is not likely to alter 
the activity of the constructs (discussed below) . It should 
be noted that the LF-PE fusions described herein contain this 

10 three-residue extension. 

For PCR reactions to make deletions of 40 and 78 amino 
acids from the amino- terminus of LF, two different mutagenic 
oligonucleotide primers were made which were substantially 
identical to the LF gene template at the intended new termini, 

15 and which added Kpnl sites at their 5' -ends. Another 

(non-mutagenic) oligonucleotide primer for introduction of a 
BamHI site at the 3 ' end of LF was prepared. Similarly, to 
make deletions at the carboxyl - terminus of LF, two different 
mutagenic primers were used which truncated LF at residues 729 

20 and 693 and introduced a BamHI site next to the new 3' ends of 
the LF gene. A second (non-mutagenic) oligonucleotide primer 
specific for the amino terminus of LF was made which 
introduced a Kpnl site at the 5' end of the gene. All of the 
primers noted above were used in PCR reactions on a pLF7 

25 template (Robertson and Leppla, 1986) to synthesize DNA 

fragments having Kpnl and BamHI sites at their 5' and 3' ends, 
respectively. The amplified LF DNAs containing the amino- and 
carboxyl- terminal deletions were digested with the appropriate 
restriction enzymes. The expression vector pVEX115f+T 

30 (provided by V. K. Chaudhary, NCI, NIH) was cleaved 

sequentially with Kpnl and BamHI and dephosphorylated. This 
expression vector contains a T7 promoter, an OmpA signal 
sequence for protein transport to the periplasm, a multiple 
cloning site that includes Kpnl and BamHI sites, and a T7 

35 transcription terminator. The LF and pVEX115f+T DNA fragments 
were purified from low melting point agarose, ligated 
overnight, and transformed into E. coli DH5a. Transf ormants 
were screened by restriction digestion to identify the desired 



WO 94/18332 



PCT/US94/01624 



43 

recombinant plasmids . Proteins produced by these constructs 
are designated according to the amino acid residues retained; 
for example the LF truncated at residue 693 is designated 
LF 1 " 693 . All of the mutant LF proteins described above contain 
5 three non-native amino acids, Met-Val-Pro, added to the amino- 
terminus as a result of the PCR manipulations. 

To analyze the role of the repeat region of LF, four 
different constructs were made: 1., removal of the entire 
repeat region (LF 1 " 307 .TR.LF 384 " 776 ) , 2., removal of the first 

10 repeat (LF 1-307 .TR. LF 327-776 ) , 3 ., removal of the last repeat 
(LF 1 " 364 . TR.LF 384 " 776 ) , and 4., removal of repeats 2-4 
(LF 1 " 326 . TR.LF 384 " 776 ) . To construct LF 1 " 307 . TR . LF 384 " 776 , four 
different primers were used in two separate PCR reactions. To 
amplify LF 1 " 307 , one oligonucleotide primer was made at the 5'- 

15 end of the LF gene which added a Kpnl site, and a second 

primer was constructed at the end of residue 307, introducing 
an Mlul site. For amplifying LF 384 " 776 , a third primer was 
made at residue 384 with an added Mlul site, and the fourth 
primer was made at the residue 776 which introduced a BamHI 

20 site at the end. Two PCR amplifications were done using 
primers one/two and three/four with pLF7 as template 
(Robertson and Leppla, 1986) . The first amplification 
reaction was digested with Kpnl and Mlul separately, and the 
second amplification reaction was digested with Mlul and 

25 BamHI. The expression vector pVEX115f+T was digested 

separately with Kpnl and BamHI and dephosphorylated. All 
three fragments were gel purified, ligated overnight at 16°C 
and transformed into E. coli DH5a. The other three constructs 
were made by similar strategies. Oligonucleotide primers one 

3 0 and four were the same for all four constructs, whereas 

primers two and three were changed accordingly. All four 
constructs contain Met-Val-Pro at the amino terminus of LF and 
Thr-Arg at the site of the repeat region deletion. 

To construct LF-PE fusion proteins, fragments of the 

35 LF gene extending from the amino terminus to various lengths 
were amplified from plasmid pLF7 (Robertson and Leppla, 1986) 
by PCR using a common oligonucleotide primer that added a Kpnl. 
site at the 5 ' end and mutagenic primers which added Mlul 
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sites at the intended new 3' ends. The PCR products of the LF 
gene were digested with Kpnl, the DNAs were precipitated, and 
subsequently digested with Mlul. Domains lb and III of the PE 
gene (provided by David FitzGerald, NCI, NIH) were amplified 
5 by PCR using primers which added Mlul and EcoRl sites at the 
5' and 3' ends, respectively. The PCR product of PE was 
digested with Mlul and EcoRL. Similarly, the expression 
vector pVEXH5f+T was digested with JCpnl and EcoRL . All DNA 
fragments were purified from low-melting agarose gels, 

10 three- fragment ligations were carried out, and the products 
were transformed into E. coli DH5a. The three constructs 
described in this example have 254, 198 and 79 amino acids of 
LF joined with PE domains lb and III. These fusion proteins 
are designated LF 1 ' 254 .TR. PE 362 ' 613 (SEQ ID NO:10), 

15 LF 1 " 198 .TR.PE 362 " 613 , and LF 1 " 79 .TR. PE 3 62-613 , respectively. The 
proteins retain the native carboxyl- terminal sequence of PE, 
REDLK. It should be noted that these abbreviations do not 
specify the entire amino acid content of the proteins, because 
all the constructs also contain Met-Val-Pro, which was added 

20 to the amino- terminus of the LF domain by the PCR 
manipulations. 

Expression and Purification of Deleted LF and Fusion Proteins 

Recombinant plasmids were transformed into E. coli 
SA2821 (provided by Sankar Adhya, NCI, NIH), a derivative of 

25 BL2KXDE3) (Studier and Moffatt, 1986) that lacks the 

proteases encoded by the Ion, OrapT, and degP genes, and has 
the T7 RNA polymerase gene under control of the lac promoter 
(Strauch et al . , 19 89). Transf ormants were grown in super 
broth with 100 /xg/ml ampicillin, with shaking at 225 rpm, 

30 37°C, in 2-L cultures. When A 600 reached 0.8-1.0, isopropyl- 
l-thio-/8-D-galactopyranoside was added to a final 
concentration of 1 mM, and cultures were incubated for an 
additional 2 h. EDTA and 1, 10-o-phenanthroline were added to 
5 and 0.1 mM, respectively, and periplasmic protein was 

35 extracted as described in Example 1. The supernatant fluids 

were concentrated by Centriprep-30 units (Amicon) and proteins 
were purified to near homogeneity by gel filtration 
(Sephacryl S-200, Pharmacia-LKB) and anion exchange 
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chromatography (MonoQ, Pharmacia- LKB) as described in Example 
1. To determine the percentage of full length protein, SDS 
gels stained with Coomassie Brilliant Blue were scanned with a 
laser densitometer (Pharmacia -LKB Ultrascan XL) . Western 
5 blots were performed as described previously (Singh et al . , 
1991) . 

The LF proteins having terminal deletions and the LF- 
PE fusion proteins were obtained from periplasmic extracts and 
purified to near homogeneity by gel filtration and anion 

10 exchange chromatography. The migration of the proteins was 

consistent with their expected molecular weights. Immunoblots 
confirmed that the LF proteins had reactivity with LF 
antisera, and the LF-PE fusion proteins had reactivity with 
both LF and PE antisera. Fusion proteins and terminally - 

15 deleted LF proteins differed in their susceptibility to 

proteolysis as judged by the appearance of peptide fragments 
on the immunoblots, and this was also reflected in the 
different amounts of purified proteins obtained. Thus, from 
2-L cultures the yields of purified proteins were LF 41 " 776 , 

20 39 fig; LF 79 " 776 , 32 ^g; LF 1 " 729 , 50 fig; LF 1 " 693 , 46 fig; 

LF 1-254 >TR pE 362-613 / 184 Mg; LP 1 " 198 .TR. PE 362 " 613 , 80 fig; 
LF l-79_ TR _ pE 362-613 f 12? ^ g< 

LF proteins deleted in the repeat region were found to 
be unstable and full size product could not be purified. 

25 Therefore, the activities of these proteins were determined by 
assay of crude periplasmic extracts, and immunoblots were used 
to estimate the amount of the full size proteins present. 
Cytotoxicity on Macrophages of LF Pr oteins Having Terminal and 
Internal Deletions 

3 0 Deleted LF proteins were assayed for LF functional 

activity on the J774A.1 macrophage cell line in the presence 
of native PA as described in Example 1. Briefly, cells were 
plated in 24- or 48 -well dishes in Dulbecco's modified Eagle 
medium (DMEM) containing 10% fetal bovine serum, and allowed 

35 to grow for 18 h. PA (1 fig/ixQ.) and the mutant LF proteins 

were added and cells were incubated for 3 h. To measure the 
viability of the treated cells, 3- [4 , 5-dimethylthiazol-2 -yl] - 
2, 5-diphenyltetrazolium bromide (MTT) was added to the cells 
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to a final concentration of 0.5 mg/ml. After incubating for 
45 min, the medium was aspirated and cells were dissolved in 
90% isopropanol, 0.5% SDS, 40 mM HC1, and read at 540 nm using 
a UVmax Kinetic Microplate Reader (Molecular Devices Corp.). 
5 To determine the extent of essential sequences at the 

amino terminus of LF, the toxicities of the two LF proteins 
deleted at the amino- terminus were measured in combination 
with PA in the macrophage lysis assay. Purified LF 41 " 776 and 
LF 79 " 776 were unable to lyse J774A.1 macrophage cells. This 

10 indicates that some portion of the sequence preceding residue 
41 is needed to maintain an active LF protein. 

To examine the role of the carboxyl terminus of LF, 
two proteins truncated in this region were prepared and 
analyzed. The proteins LF 1 " 693 and LF 1 " 729 were assayed on 

15 J774A.1 cells and found to be inactive. This is presumed to 
be due to inactivation of the putative catalytic domain. 

To begin study of the role of the repeat region of LF, 
four constructs were made having deletions in this region. 
The proteins expressed from these mutants were unstable. Of 

20 the four deleted proteins, only LF 1 " 307 .TR.LF 327 ' 776 had 

immunoreactive material at the position expected of intact 
fusion protein. The amount of intact LF 1 " 307 .TR.LF 327 ' 776 was 
similar to that of native LF expressed in the same vector. 
When these unpurified periplasmic extracts were tested in 

25 J774A.1 macrophages, only the native LF control was toxic. 

LF 1 " 307 . TR.LF 327 " 776 did not lyse macrophages even when present 
at 50 -fold higher concentration than that of crude periplasmic 
protein of LF. Conclusions cannot be drawn about the 
toxicities of the other three constructs because full size 

30 fusion proteins were not present in the periplasmic extracts. 

Cell Culture Techniques and Protein Synthesis Inhibition Assay 
of Fusion Proteins 

CHO cells were maintained as monolayers in a-modified 
minimum essential medium (a-MEM) supplemented with 5% fetal 

35 bovine serum, 10 mM HEPES (pH 7.3), and 

penicillin/streptomycin. Protein synthesis assays were 
carried out in 24- or 48 -well dishes as described in Example 
1. CHO cells were incubated with PA (0.1 ug/ml) and varying 
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concentrations of LF, which is expected to block the receptor. 
Fusion proteins were added at fixed concentrations, as 
follows: FP4 , 100 ng/ml, FP23, 100 ng/ml, and FP33, 5 ng/ml. 
Cells were incubated for 20 hr and protein synthesis 
5 inhibition was evaluated by [ 3 H] leucine incorporation. 
Cytotoxicity of the LF- PE Fusi on Proteins on CHO Cells 

The use of fusion proteins provides a more defined 
method for measuring the translocation of LF, as demonstrated 
in Example 1 showing that fusions of LF with domains lb and 

10 III of PE are highly toxicy. Translocation of these fusions 
is conveniently measured because domain III blocks protein 
synthesis by ADP-ribosylation of elongation factor 2. The new 
fusions containing varying portions of LF fused to PE domains 
lb and III were designed to identify the minimum LF sequence 

15 able to promote translocation. The EC 50 of LF 1 " 254 . TR. PE 362-613 
(SEQ ID NO: 10) was 1.7 ng/ml, whereas LF 1 " 198 .TR. PE 362 * 613 and 
LF 1 * 79 .TR.PE 362 " 613 did not kill 50% of the cells even at a 
1200-fold higher concentration. Other constructs were also 
made and analyzed, containing larger portions of LF fused to 

20 PE domains lb and III, and found those to be equal in potency 
to LF 1 " 254 .TR.PE 362 " 613 . These results show that residues 1-254 
contain all the sequences essential for binding to PA63 . The 
fusion proteins had no toxicity in the absence of PA, proving 
that their internalization absolutely requires interaction 

25 with PA. 

Binding of Fusion Proteins and Deleted LF Pro teins to PA 

Binding of LF proteins to cell bound PA was determined 
by competition with radiolabeled 125 I-LF. Native LF was 
radiolabeled (3.1 x 10 6 cpm//xg protein) using the 

30 Bolton-Hunter reagent. Binding studies employed the L6 rat 
myoblast cell line, which has approximately twice as many 
receptors as the J774A.1 macrophage line (Singh et al . , 1989). 
For convenience, cells were chemically fixed by a gentle 
procedure that preserves the binding activity of the receptor 

35 as well as the ability of the cell -surface protease to cleave 
PA to produce receptor-bound PA63 . Assays were carried out 
in 24 -well dishes using cells plated in DMEM with 10% fetal 
bovine serum one day before the experiment . Cell monolayers 
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were washed twice with Hanks' balanced salt solution (HBSS) 
containing 25 mM HEPES and were chemically fixed for 30 min at 
23° in 10 mM N-hydroxysuccinimide and 30 mM l-ethyl-3- [3- 
dimethyl [aminopropyl] carbodiimide, in buffer containing 
5 10 mM HEPES, 140 mM NaCl, 1 mM CaCl 2 , and 1 mM MgCl 2 . 

Monolayers were washed with HBSS containing 25 mM HEPES and 
the fixative was inactivated by incubating 30 min at 23° in 
DMEM (without serum) containing 25 mM HEPES. Native PA was 
added at 1 fig/ml in minimum essential medium containing Hanks' 

10 salts, 25 mM HEPES, 1% bovine serum albumin, and a total of 
4.5 mM NaHC0 3 . Cells were incubated overnight at room 
temperature to allow binding and cleavage of PA. Cells were 
washed twice in HBSS and mutant LP proteins (0-5000 ng/ml) 
along with 50 ng/ml 125 I-LF was added to each well. Cells 

15 were further incubated for 5 h, washed three times in HBSS, 
dissolved in 0.5 ml 1 N NaOH, and counted in a gamma counter 
(Beckman Gamma 9000) . 

Using this assay, the LF mutant proteins having amino - 
terminal deletions were found incapable of binding to PA, 

20 thereby explaining their lack of toxicity. Carboxyl - terminal 
deleted LF proteins did bind to PA in a dose dependent manner, 
although they had slightly lower affinity than LF. The 
proteins deleted in the repeat region could not be tested for 
competitive binding because their instability prevented 

25 purification of intact protein. 

The EC 50 for LF 1 " 254 .TR.PE 362 " 613 binding was found to 
be 220 ng/ml, which is similar to that of LF, 300 ng/ml. 
Therefore the binding data correlate well with the toxicity of 
this construct. In contrast, neither LF 1 " 198 . TR. PE 3 62-513 nor 

30 LF 1 " 79 .TR. PE 362 " 613 bound to PA63 on cells, thereby explaining 
their lack of toxicity. 

EXAMPLE 3: Construction of Ge nes Encoding PA Fusion Proteins 
The genes encoding PA (or PA truncated at the carboxyl 
35 terminus to abrogate binding to the PA receptor) and an 

alternative targeting moiety (a single- chain antibody, growth 
factor, or other cell type-specific domain) are spliced using 
conventional molecular biological techniques. The PA gene is 
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readily available, and the genes encoding alternative 
targeting domains are derived as described below. 
Single- chain antibodies (sFv) 
See Example 4, below. 
5 Growth factors and other targeting proteins 

The nucleotide sequences of genes encoding a number of 
growth factors and other proteins that are targeted to 
specific cell types or classes are reported in freely 
accessible databases (e.g., GenBank) , and in many cases the 

10 genes are available. In circumstances where this is not the 
case, genes can be cloned from genomic or cDNA libraries, 
using probes based on the known nucleotide sequence of the 
gene that codes for the growth factor, or derived from a 
partial amino acid sequence of the protein (see, e.g. 

15 Sambrook, supra.). Alternatively, genes encoding the growth 
factor or other targeting moiety can be produced de novo from 
chemically synthesized overlapping oligonucleotides, using the 
preferred codon usage of the expression host. For example, 
the gene for human epidermal growth factor urogastrone was 

20 synthesized from the known amino acid sequence of human 

urogastrone using yeast preferred codons . The cloned DNA, 
under control of the yeast GAPDH promoter and yeast ADH-l 
terminator, expresses a product having the same properties as 
natural human urogastrone. The product of this synthesized 

25 gene is nearly identical to that of the natural urogastrone, 
the only difference being that the product of the synthetic 
gene has a trptophan at amino acid 13, while the other has a 
tyrosine (Urdea et al . Proc. Natl. Acad. Sci . USA 80:7461- 
7465, 1983) . 

3 0 Expression of PA Fusion proteins 

Once constructed, genes encoding PA- fusion proteins 
are expressed in Bacillus anthracis, and recombinant proteins 
are purified by one of the following methods: (i) size-based 
chromatographic separation; (ii) affinity chromatography. In 

35 the case of PA-sFv fusions, immobilized metal chelate affinity 
chromatography may be the purification method of choice, 
because addition of a string of six histidine residues at the 
carboxyl terminus of the sFv will have no detrimental effect 
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on binding to antigen. Additional methods of expression of 
PA- fusion proteins utilize an in vitro rabbit reticulocyte 
lysate-based coupled transcription/translation system, which 
has been demonstrated to accurately refold chimeric proteins 
5 consisting of an sFv fused to diphtheria toxin, or Pseudomonas 
exotoxin A as demonstrated in Example 4. 
Functional testing of PA Fusion proteins 

After expression and purification, functionality of 
PA- fusion proteins are tested by determining their ability to 

10 act in concert with an LF-PE fusion protein to inhibit protein 
synthesis in an appropriate cell line. Using a PA-anti human 
transferrin receptor sFv fusion as a model, the following 
properties are examined: (i) Cell type- specif icity (protein 
synthesis should be inhibited in cell lines which express the 

15 human transferrin receptor, but not in those which do not) ; 

(ii) Independence of toxicity from PA receptor binding (excess 
free PA should have no effect on toxicity of the PA-sFv/LF-PE 
complex) ; (iii) Competitive inhibition by excess free antibody 
(toxicity should be abrogated in the presence of excess sFv, 

20 or the monoclonal antibody from which it was derived) . For 
example such tests are described in Examples 4 and 5. These 
studies and other studies are used to confirm that PA has been 
successfully re-routed to an alternative receptor to permit 
the use of the present anthrax toxin-based cell type-specific 

25 cytotoxic agents for the treatment of disease. 

EXAMPLE 4: Generating Fusion Proteins with Single-chain 
Antibodies Reagents 

Methionine- free rabbit reticulocyte lysate-based 

30 coupled transcription/translation reagents, recombinant 
ribonuclease inhibitor (rRNasin) , and cartridges for the 
purification of plasmid DNA were purchased from Promega 
(Madison, WI) . Tissue culture supplies were from GIBCO (Grand 
Island, NY) and Biofluids (Rockville, MD) . 0KT9 monoclonal 

35 antibody was purchased from Ortho Diagnostic Systems (Raritan, 
NJ) . PCR reagents were obtained from by Perkin- Elmer Cetus 
Instruments (Norwalk, CT) , and restriction and nucleic acid 
modifying enzymes (including M-MLV reverse transcriptase) were 
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from GIBCO-BRL (Gaithersburg, MD) . A Geneclean kit for the 
recovery of DNA from agarose gels was supplied by BIO 101 (La 
Jolla, CA) . Hybridoma mRNA was isolated using a Fast Trak 
mRNA isolation kit (Invitrogen, San Diego, CA) . All isotopes 
5 were purchased from Du Pont-New England Nuclear (Boston, MA) , 
except [Adenylate - 32 P] NAD, which was supplied by ICN 
Biomedicals (Costa Mesa, CA) . Pseudomonas exotoxin A was 
obtained from List Biologicals (Campbell, CA) . 
Oligonucleotides were synthesized on a dual column Milligen- 

10 Biosearch Cyclone Plus DNA synthesizer (Burlington, MA) , and 
purified using OPC cartridges (Applied Biosystems, Foster 
City, CA) . DNA templates were sequenced using a Sequenase II 
kit (United States Biochemical Corp., Cleveland, OH), and SDS- 
polyacrylamide gel electrophoresis (PAGE) was performed using 

15 10-20% gradient gels (Daiichi, Tokyo, Japan) . After 

electrophoresis, gels were fixed in 10% methanol/7% acetic 
acid, and soaked in autoradiography enhancer (Amplify, 
Amersham Arlington Heights, IL) . After drying, 
autoradiography was performed overnight using X-OMAT AR2 film 

20 (Eastman Kodak, Rochester, NY) . 
Plasmids 

The vector pET-lld is available from Novagen, Inc., 
Madison, WI . Plasmids were maintained and propagated in E. 
coli strain XLl-Blue (Stratagene, La Jolla, CA) . 

25 Cell Lines 

K562, a human erythroleukemia- derived cell line [ATCC 
CCL 243] known to express high levels of the human transferrin 
receptor at the cell surface, was cultured in RPMI 1640 medium 
containing 24 mM NaHC0 3 , 10% fetal calf serum, 2 mM glutamine, 

30 1 mM sodium pyruvate, 0.1 mM nonessential amino acids, and 10 
/zg/ml gentamycin. An African green monkey kidney line, Vero 
(ATCC CCL 81), was grown in Dulbecco ' s modified Eagle's medium 
(DMEM) supplemented as indicated above. The 0KT9 hybridoma 
(ATCC CRL 8021), which produces a MoAb (IgG x ) reactive to the 

35 human transferrin receptor, was maintained in Iscove's 

modified Dulbecco' s medium containing 20% fetal calf serum, in 
addition to the supplements described above. All cell lines 
were cultured at 37°C in a 5% C0 2 humidified atmosphere. 
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Construction ' of sFv from H ybridomas 

Antibody V L and V H genes were cloned using a 
modification of a previously described technique (Larrick et 
al. Bioteclmiques 7:360, 1989; Orlandi et al . Proc. Natl. 
5 Acad. Sci. USA 86:3833, 1989; Chaudhary et al . , 1990). 

Briefly, mRNA was isolated from 1 x 10 8 antibody producing 
hybridoma cells, and approximately 3 \iq was reverse 
transcribed with M-MLV reverse transcriptase, using random 
hexanucleotides as primers. The resulting cDNA was screened 

10 with two sets of PCR primer pairs designed to ascertain from 
which Kabat gene family the heavy and light chains were 
derived (Kabat et al. Sequences of proteins of immunological 
interest. Fifth Edition. (Bethesda, Maryland: U.S. Public 
Health Service, 1991) . Having identified the most effective 

15 primer pairs, cDNA's encoding V L and V H were spliced, 

separated by a region encoding a 15 amino acid peptide linker, 
using a previously described PCR technique known as gene 
splicing by overlap extension (SOE) (Johnson & Bird Methods 
Enzymol. 203:88, 1991) . The sFv gene was then cloned into 

20 pET-lld, in frame and on the 5' -side of the PE40 gene, such 

that expression of the construct should generate an sFv-PE40 
fusion protein approximately 70 kDa in size. 
Design of primers for PCR amplification of V region crenes 
The first and third complementarity determining 

25 regions (CDRs) of terminally rearranged immunoglobulin 

variable region genes are flanked by conserved sequences (the 
first framework region, FRl on the 5' side of CDRl, and the 
fourth framework region, FR4, on the 3' side of CDR3) . 

Although murine variable region genes have been 

30 successfully cloned, regardless of family, with just two pairs 
of highly degenerate primers (one pair for V L and another for 
V H ) (Gussow et al. Cold Spring Harbor Symp. Quant. Biol. 
54:265, 1989; Orlandi et al., 1989; Chaudhary et al . , 1990; 
Batra et al., 1991), the method may not be effective in cases 

35 where the number of mismatches between primers and the target 
sequence is extensive. With this in mind, using the Kabat 
database' of murine V gene sequences the present invention 
provides a set of ten FRl-derived primers (six for V L and four 
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for V H ) , such that any of the database sequences selected at 
random would have a maximum of three mismatches with the most 
homologous primer. This set of primers can be used 
effectively to clone V region genes from a number of MoAb 
5 secreting cell lines. 

Assembly of the 0KT9 sFv gene 

mRNA isolated from the hybridoma secreting the 0KT9 
MoAb was converted to cDNA as described previously (Larrick et 
al., 1989; Orlandi et al., 1989; Chaudhary et al., 1990). 

10 Despite the fact that CL-UNI is the partnering oligonucleotide 
in each case, a product the required size (approximately 400 
bp) is not produced by V L primers IV/VI, Ha or lib. This 
suggests that mismatches between these primers and the target 
sequence were too extensive to allow efficient amplification. 

15 A similar argument can be used to explain the failure of V H 
primers I and III to produce the required product. It is 
clear that primers V L -I/III and V H -V are most effective at 
amplifying the 0KT9 V L and V H genes respectively. PCR 
amplified 0KT9 V L and V H genes were spliced together using the 

20 SOE technique, as previously described (Johnson & Bird, 1991) . 
A synthetic DNA sequence encoding a 15 amino acid linker, was 
inserted between the variable regions; this linker has been 
used very effectively in the production of functional sFv 
(Huston et al . , 1991; Johnson & Bird, 1991), and appears to 

25 allow the variable chains to assume the optimum orientation 

for antigen binding. Following splicing of V region genes by 
the SOE procedure, the DNA fragment encoding the OKT9 sFv was 
electrophoresed through a 1.5% agarose gel, purified by the 
Geneclean technique, digested with the appropriate pair of 

30 restriction enzymes, and cloned into the pET-lld expression 
vector in frame and on the 5 1 side of the PE40 gene. 
In vitro expression of sFv-PE40 fusion proteins 

Plasmid templates were transcribed and translated 
using a rabbit reticulocyte lysate-based transcription/ 

35 translation system, according to the instructions of the 
manufacturer, in 96 -well microtiter plate format L- 
[ 35 S] methionine -labeled proteins (for analysis by SDS-PAGE) 
and unlabeled proteins (for enzymatic analysis and bioassay) , 
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were produced in similar conditions, except that the isotope 
was replaced with 2 0 /M unlabeled L-methionine in the latter 
case. Control lysate was produced by adding all reagents 
except plasmid DNA. After translation, unlabeled samples were 
5 dialysed overnight at 4°C against phosphate-buf f ered saline 

(PBS), pH 7.4 in Spectra/Por 6 MWCO (molecular weight cutoff) 
50,000 tubing (Spectrum, Houston, TX) . 

Constructs incorporating the aberrant kappa transcript 
will contain a translation termination codon in the V L chain 

10 as previously described, and would therefore be expected to 
generate a translation product approximately 12 kDa in size. 
On the other hand, constructs which have incorporated the 
productive V L gene contain no such termination codon, and a 
full-length fusion protein (approximately 70 kDa in size) 

15 should be produced. 

In vitro expression studies were used to determine the 
size of the protein encoded by the OKT9 sFv-PE4 0 gene. The 
constructs tested in this experiment clearly produce a protein 
of approximately 70 kDa, indicating that the clones do not 

20 contain the aberrant V L gene, and are devoid of frameshift 
mutations. Of several 0KT9 sFv constructs tested, none 
apparently incorporated the incorrect VL gene. However, in 
the case of another sFv generated by this method (1B7 sFv, 
derived from a MoAb which binds to pertussis toxin) , the 

25 majority of the clones tested produced a 12 kDa protein, and 
were found to contain the aberrant transcript on DNA 
sequencing. It should be noted that the 12kDa fragment is 
frequently obscured in 10-20% gradient gels by unincorporated 
35 S -methionine which co-migrates with the dye front. 

30 Determination of Protein Concentration 

The enzymatic activities of fusion proteins were 
compared with those of known concentrations of PE in an ADP- 
ribosyl transferase assay, allowing molarities to be 
determined (Johnson et al . J". Biol. Chem. 263:1295-1399, 

35 1988) . Samples were adjusted to contain equivalent 

concentrations of lysate, thus maintaining an identical amount 
of substrate (elongation factor 2) in all cases. 
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Protein Synthesis Inh ibition Assay for Functional sFv-PE40 
Binding 

Binding of the 0KT9 sFv to the human transferrin 
receptor was qualitatively determined by assessing the ability 
5 of the 0KT9 sFv-PE4 0 fusion protein to inhibit protein 

synthesis in the K562 cell line. Pseudomonas exotoxin A is a 
bacterial protein which is capable of inhibiting de novo 
protein synthesis in a variety of eukaryotic cell types. The 
toxin binds to the cell surface, and ultimately translocates 

10 to the cytosol where it enzymatically inactivates elongation 
factor 2. PE40 is a mutant form of exotoxin A which lacks a 
binding domain, but is enzymatically active, and capable of 
translocation. Fusion proteins containing PE40 and an 
alternative binding domain (for example, an sFv to a cell 

15 surface receptor) will inhibit protein synthesis in an 

appropriate cell line only if the sFv binds to a cell -surface 
antigen which subsequently internalizes into an acidified 
endosome (Chaudhary et al . , 1989). The TfnR is such an 
antigen, so a qualitative assessment of binding may be 

20 determined by measuring the ability of the 0KT9 sFv-PE40 

fusion protein to inhibit protein synthesis in a cell line 
like K562, which expresses the TfnR. Protein synthesis 
inhibition assays were performed as described previously 
(Johnson et al . , 1988). Briefly, samples were serially 

25 diluted in ice cold PBS, 0.2% BSA, and llpl volumes were added 
to the appropriate well of a 96 -well microtiter plate 
(containing 10 4 cells/100/xl/well in leucine- free RPMI 1640) . 
After carefully mixing the contents of each well, the plate 
was incubated for the indicated time at 37°C in a 5% C0 2 

30 humidified atmosphere. Each well was then pulsed with 20/il of 
L- [ 14 C (U) ] leucine (0.1 /xCi/20/*l) , incubated for 1 hour, and 
harvested onto glass fiber filters using a PHD cell harvester 
(Cambridge Technology, Cambridge, MA) . Results are expressed 
as a percentage of the isotope incorporation in cells treated 

35 with appropriate concentrations of control dialyzed lysate. 

The results of this assay, clearly indicate that 0KT9 
sFv- PE40 is capable of inhibiting protein synthesis with an 
IC 50 (the concentration of a reagent which inhibits protein 
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synthesis by 50%) of approximately 2 x 10" 9 M. The toxicity 
of the fusion protein, but not of PE, was abrogated in the 
presence of excess 0KT9 MoAb (12 /tg/ml) , indicating that 
binding is specific for the TfnR. No toxicity was observed 
5 when K562 was substituted with Vero (an African Green monkey 
cell line which expresses the simian version of the 
transferrin receptor) , indicating that the OKT9 sFv retains 
the human receptor- specif ic antigen binding properties of the 
parent antibody. 

10 Having demonstrated binding of the OKT9 sFv to TfnR, 

its nucleotide sequence was determined using dideoxynucleotide 
chain- terminating methods, confirming extensive homology with 
the respective regions of immunoglobulins of known sequence. 

15 EXAMPLE 5: Characterizatio n of single-chain antibody (sFv) - 

toxin fusion proteins produced in vi tro in rabbit reticulocyte 
lysate 

The present invention provides in vitro production of 
proteins containing a toxin domain (derived from Diphtheria 

20 toxin (DT) or PE) fused to a domain encoding a single- chain 
antibody directed against the human transferrin receptor 
(TfnR) . The expression of this antigen on the cell surface is 
coordinately regulated with cell growth; TfnR exhibits a 
limited pattern of expression in normal tissue, but is widely 

25 distributed on carcinomas and sarcomas (Gatter, et al . J". 
Clin. Pathol. .36:539-545, 1983), and may therefore be a 
suitable target for immuno toxin -based therapeutic strategies 
(Johnson, V. G. and Youle, R. J. "Intracellular Trafficking of 
Proteins" Cambridge Univ. Press, Cambridge England, Steer and 

30 Hover eds . , pp. 183-225; Batra et al . , 1991; Johnson et al . , 
1988) . 

Proteins consisting of a fusion between an sFv 
directed against the TfnR and either the carboxyl- terminus 40 
kDa of PE, or the DT mutant CRM 107 [S(525)F] were expressed 
35 in rabbit reticulocyte lysates, and found to be specifically 
cytotoxic to K562, a cell line known to express TfnR. In 
comparison, a chimeric protein consisting of a fusion between 
a second DT mutant, DTM1 [S(508)F, S(525)F] and the E6 sFv 
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exhibited significantly lower cytotoxicity. Legal 
restrictions imposed on manipulating toxin genes in vivo 
previously prevented expression of potentially interesting 
toxin- containing fusion proteins {Federal Register 
5 51 (88) (III) :16961 and Appendix F:16971) ; the present invention 
provides a novel procedure for in vitro gene construction and 
expression which satisfies the regulatory requirements, 
facilitating the first study of the potential of non- truncated 
DT mutants in fusion protein ITs . The present data also 
10 demonstrates that functional recombinant antibodies can be 
generated in vitro. 
Reagents 

DT and PE were purchased from List Biologicals 
(Campbell, CA) . Nuclease treated, methionine- free rabbit 
15 reticulocyte lysate and recombinant ribonuclease inhibitor 
(rRNasin) were obtained from Promega (Madison, WI) . Tissue 
culture supplies were from GIBCO (Grand Island, NY) and 
Biofluids (Rockville, MD) . Reagents for PCR were provided by 
Perkin- Elmer Cetus (Norwalk, CT) . Restriction and nucleic 

2 0 acid modifying enzymes were from Stratagene (La Jolla, CA) , as 

was the mCAP kit used to produce capped mRNA in vitro. 
Geneclean and RNaid kits (for the purification of DNA and RNA 
respectively) were supplied by BIO 101 (La Jolla, CA) . L- 

[ 35 S] methionine, L- [ 14 C (U) ] leucine and 5 ■ - (alpha- thio) - 
25 [ 35 S]dATP were from New England Nuclear (Boston, MA) . 

[Adenylate - 32 P] NAD was supplied by ICN Biomedicals (Costa 
Mesa, CA) . 

Oligonucleo tide Synthesis 

Oligonucleotides were synthesized (0.2/iM scale), using 

3 0 cyanoethylphosphoramidites supplied by Milligen-Biosearch 

(Burlington, MA) on a dual column Cyclone Plus DNA 
synthesizer. Post -synthesis purification was achieved using 
OPC cartridges (Applied Biosystems, Foster City, CA) . 
Plasmids 

35 pET-lld was the generous gift of Dr. F. William 

Studier, Brookhaven National Laboratory (Upton, NY) . pHB21- 
PE40, a derivative of pET-lld containing the gene for PE40, 
was kindly supplied by Dr. David FitzGerald (NIH, Bethesda, 
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MD) . All plasmids were maintained and propagated in E. coli 
strain XLl-Blue (Stratagene, La Jolla, CA) . 
Cell Lines 

Corynebacterium diphtheriae strain C7 s (j3) tox+ (ATCC 
5 27012) was obtained from the ATCC (Rockville, MD) , and the 

strain producing the binding -deficient DT mutant CRM 103 was 
the generous gift of Dr. Neil Groman, University of Washington 
(Seattle, WA) . Both strains were propagated in LB broth. 
K562 (a human erythroleukemia- derived cell line, ATCC CCL 243) 

10 was cultured in RPMI 1640 medium containing 24 mM NaHC0 3 , 10% 
fetal calf serum, 2 mM glutamine, 1 mM sodium pyruvate, 0 . 1 mM 
nonessential amino acids, and 10 /xg/ml gentamycin. Vero (an 
African green monkey kidney line, ATCC CCL 81) was grown in 
Dulbecco's modified Eagle's medium supplemented as described 

15 above. All eukaryotic cells were cultured at 37°C in a 5% C0 2 
humidified atmosphere. 
Splicing Genes using PCR 

Genes encoding antibody V L and V H were spliced, 
separated by a region encoding a 15 amino acid peptide linker, 

20 using a previously described PCR technique known as gene 

splicing by overlap extension (SOE) (Horton et al . Gene 77:61- 
68, 1989; Horton et al . Biotecisnigues 8:528-535, 1990). For 
studies requiring in vitro expression of PCR products, tax 
gene -derived fragments were linked to those encoding sFv using 

25 a similar method, without the use of restriction enzymes. 

Construction of Plasmids Encoding Toxin- sFv Fusion Proteins 
The gene encoding PE40 was obtained as an insert in 
pET-lld, and the sFv gene was cloned on the 5' side of this 
insert as indicated. To clone the gene encoding the DT 

30 binding-site mutant DTM1 [S(508)F, S(525)F], genomic DNA was 
isolated from the C. diphtheriae strain which produces CRM 
103 . DNA was extracted by a modification of the 
cetyltrimethylammonium bromide extraction procedure (Wilson, 
K. "Current Protocols in Molecular Biology" Asubel et al . eds. 

3 5 John Wiley & Sons New York, 2.4.1 - 2.4.5, 1988) and subjected 
to 20 cycles of PCR amplification. Primers were designed to: 
(i) amplify the 1605 bp region encoding CRM 103, concomitantly 
mutating the codon at position 525 from TCT to TTT, and (ii) 
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incorporate restriction sites appropriate for cloning. The 
mutations present in CRM 107 and CRM 103 were thus combined on 
a single gene. 

In Vitro Transcription of DNA Templates 
5 For transcription, DNA templates required a T7 RNA 

polymerase promoter immediately upstream of the gene of 
interest (Oakley, J. L. and Coleman, J. E. Proc. Acad. Sci . 
U.S.A. 74:4266-4270, 1977) . Such a promoter was conveniently 
present in pET-lld (Studier et al . Enzymol 185:60-89, 1990). 

10 In the case of PCR products, the upstream primer (a 57-mer, 

T7-DT) was used to introduce all of the elements necessary for 
in vitro transcript ion/ translation. T7-DT includes a 
consensus T7 RNA polymerase promoter, together with the first 
seven codons of mature DT (Greenfield et al. Proc. Natl. Acad. 

15 Sci. U.S.A. 80:6853-6857, 1983) immediately preceded by an ATG 
translation initiation codon in the optimum Kozak context 
(Kozak, M. J. Biol. Chem. 266:19867-19870, 1991). 
m 7 G(5' )ppp(5' )G-capped RNA was produced by transcription from 
linearized plasmids or PCR products using an mCAP kit, 

20 according to the manufacturer's protocol. Prior to 

translation, RNA was purified using an RNaid kit, recovered in 
nuclease free water, and analyzed by formaldehyde gel 
electrophoresis . 

In Vitro Exp ression of Fusion Proteins 

25 L- [ 35 S] methionine -labelled proteins (for analysis by 

SDS-PAGE) were produced from capped RNA in methionine- free, 
nuclease treated rabbit reticulocyte lysate, according to the 
supplier's instructions. Unlabeled proteins (for bioassay) , 
were produced in similar conditions, except that the isotope 

30 was replaced with 20 piM unlabeled L-methionine . Control 

lysate was produced by adding all reagents except exogenous 
RNA. After translation, samples were dialysed overnight at 
4°C against PBS, pH 7.4 in Spectra/Por 6 MWCO 50,000 tubing 
(Spectrum, Houston, TX) . 

35 Prior to transcription, plasmids were linearized at 

the Bglll site and treated with proteinase K to destroy 
ribonucleases that may contaminate the sample. After 
phenol/chloroform extraction and ethanol precipitation, DNA 
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was dissolved in nuclease free water to a concentration of 
approximately 0.2 /xg//il . m 7 G (5 1 ) ppp (5 ' ) G- capped RNA was 
synthesized by T7 RNA polymerase using the conditions 
recommended by the manufacturer, and its integrity was 
5 confirmed by formaldehyde gel electrophoresis. Capped RNA was 
translated in a commercially available rabbit reticulocyte 
lysate, according to the instructions of the manufacturer. It 
is clear from the gel that the major band in each case has a 
molecular weight corresponding to that of the protein of 

10 interest, and that relatively large molecules (approximately 

12 0 kDa in the case of DTM1-E6 sFv-PE40) can be synthesized in 
the lysate using the conditions described. 

Immediately following translation, samples were 
extensively dialyzed overnight at 4°C against PBS, pH 7.4. 

15 The dialysis step was found to be essential, because non- 
dialyzed rabbit reticulocyte lysate resulted in the 
incorporation of significantly lower amounts of 14 C- leucine 
upon assay by protein synthesis inhibition in all cell lines 
tested. After determining the concentration of the newly 

20 synthesized protein using a standard assay for measuring ADP- 
ribosyltransferase activity (Johnson et al . , 1988), the 
cytotoxic activity of samples was immediately determined. 
ADP-ribosyl Transferase Assay 

The enzymatic activity (and therefore molarity) of 

25 fusion proteins was determined by comparison with DT or PE 
standard curves, as described previously (Johnson et al . , 
1988) . Appropriate volumes of control lysate were added to 
each standard curve sample, in order to control for the 
presence of significant levels of EF-2 in reticulocyte lysate. 

3 0 Other Methods 

SDS-PAGE was performed as previously described 
(Laemmli, U. K. Nature 227:680-685, 1970), using 10-20% 
gradient gels (Daiichi, Tokyo, Japan). Once electrophoresis 
was complete, gels were fixed for 15 minutes in 10% methanol, 

35 7% acetic acid, and then soaked for 3 0 minutes in 

autoradiography enhancer (Amplify, Amersham Arlington Heights, 
IL) . After drying, autoradiography was performed overnight 
using X-OMAT AR2 film (Eastman Kodak, Rochester, NY) , in the 
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absence of intensifying screens. Dideoxynucleotide chain- 
termination sequencing of double- stranded DNA templates was 
performed using a Sequenase II kit (United States Biochemical 
Corp., Cleveland, OH), according to the manufacturer's 
5 protocol . 

Cytotoxicity of Toxin- sFv Fusion Pro teins Expressed in 
Reticulocyte Lysates 

The cytotoxic activity of fusion proteins was 
determined by their ability to inhibit protein synthesis in 

10 relevant cell lines (e.g., K562) . Assays were performed as 

described previously (Johnson et al., 1988). Briefly, samples 
were serially diluted in ice cold PBS, 0.2% BSA, and ll/xl 
volumes were added to the appropriate well of a 96 -well 
microtiter plate (containing 10 4 cells/well in leucine- free 

15 RPMI 164 0) . After carefully mixing the contents of each well, 
the plate was incubated for the indicated time at 37°C in a 5% 
C0 2 humidified atmosphere. Each well was then pulsed with 
20/il of L- [ 14 C (U) ] leucine (0.1 /nCi/20/zl) , incubated for 1 
hour, and harvested onto glass fiber filters using a PHD cell 

20 harvester (Cambridge Technology, Cambridge, MA) . Results were 
expressed as a percentage of the isotope incorporation in 
cells treated with appropriate concentrations of control 
dialyzed lysate. 

The results of the protein synthesis inhibition assay 

25 clearly indicate that PE40- containing fusion proteins 

synthesized in cell -free reticulocyte lysates are highly 
cytotoxic to this cell line (IC 50 1 x 10" 10 M) . In contrast, 
DTM1-E6 sFv was at least ten- fold less toxic to K562 than the 
PE40- containing fusion protein, despite the fact that it 

3 0 exhibited ADP-ribosyl transferase activity indistinguishable 
from that of wt DT synthesized from an equivalent amount of 
RNA in an identical reticulocyte lysate mix. Since the 
decreased toxicity of DTM1-E6 sFv is clearly not due to a 
deficit in enzymatic activity, the binding and/or 

35 translocation process is implicated. Possible mechanisms by 
which the sFv- antigen interaction could be inhibited include: 
(i) misfolding of the sFv domain or (ii) steric interactions 
with other regions of the fusion protein preventing close 
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association of sFv with the TfnR. It is of interest that a 
tripartite protein, DTM1-E6 sFv-PE40 was significantly 
cytotoxic to K562 (IC 50 around 1 x 10 " 10 M, similar to that of 
PE40-E6 sFv) , and the toxic effect was clearly mediated via 
5 the TfnR, since this activity was blocked by addition of 

excess E6 Mab. Although it is possible that the inclusion of 
the PE40 moiety at the carboxyl end of the tripartite molecule 
results in a significant conformational change in domains more 
proximal to the amino terminus, it seems unlikely that the sFv 

10 binding domain of DTM1-E6 is misfolded, or unavailable to 

interact with the TfnR. Interactions of DTM1-E6 sFv with the 
cell surface could be measured in a direct binding assay 
(Greenfield et al. Science 238:536-539, 1987), but these 
studies were not performed in the course of this 

15 investigation. Nevertheless, it appears likely that the lack 
of toxicity of the DTM1-E6 sFv fusion protein is due to a 
deficit in its translocation function. 

The expression system developed is rapid and easy, and 
facilitates the manipulation of a number of samples at once. 

20 No complicated protein purification or refolding procedures 
are required, and the method can be used to express proteins 
which, due to restrictions imposed on the manipulation of 
toxin- encoding genes, could not be produced by more 
conventional methods. The technique is ideal for ascertaining 

25 the suitability of new sFv for IT development; it is 

theoretically possible to assemble the sFv- encoding gene (and 
that encoding the IT itself) by splicing of PCR products 
derived directly from the hybridoma, without the necessity for 
cloning. This would facilitate the selection of the most 

30 promising candidate molecule, prior to investing considerable 
effort and expense in large scale protein production and 
purification. Toxins and toxin- containing fusion proteins are 
proving to be powerful aids in our understanding of receptor 
mediated endocytosis and intracellular routing, and are 

35 providing valuable insight into normal cell function (reviewed 
in ref. 2). The method described simplifies the generation of 
such molecules, and facilitates their production and use in 
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expression methods would be impractical . 

Example 6 : Cassette Mutagenes is to Produce PAHIV Mutants. 

Three pieces of DNA are joined together. Piece A has 
vector sequences and encodes the "front half" (5' end of the 
gene) of PA protein, B is short piece of DNA (referred to as a 
cassette) and encodes a small middle piece of PA protein and 
piece C which encodes the "back half" (3' end of the gene) of 
PA. 

PA with alternate HIV-l cleavage sites were created by 
a cassette mutagenesis procedure. Eight deoxyoligonucleotides 
were synthesized for construction of cassettes coding for 
specifically designed amino acid sequences. All four 
cassettes were generated by annealing two synthetic 
oligonucleotides (primers) . 

Primer 1A CG CAA GTA TCA CAA AAT TAT CCG ATC GTG CAA AAC ATA CTG CAG G 
Q V S O N Y P 1 VP N I L Q 

Primer IB g rrc ctg cag tat gtt ttg cac gat cgg ata att ttg tga tac ttg 

Primer 2A CG AAC ACT GCC ACT ATC ATG ATG CAA CGT GOT AAT TTT CTG CAG G 
N T AT I M M O R G N F L Q 

Primer 2B G tcc era cag aaa att acc acg ttg cat cat gat agt ggc agt gtt 

Primer 3A CG ACT GTC TCT TTT AAC TTC CCG CAA ATC ACG CTT TGG CTG CAG G 
T V S F N F P O I T L W L Q 

Primer 3B G TCC CTG CAG CCA AAG CGT GAT TTG CGG GAA GTT AAA AGA GAC AGT 

Primer 4A CG GGC GGT TCT GCC TTT AAC TTC CCG ATC GTC ATG GGA GGT CTG CAG G 
G G S AFNFPIVM G G L Q 

Primer 4B G TCC CTG CAG ACC TCC CAT GAC GAT CGG GAA GTT AAA GGC AGA ACC GCC 

The underlined portion of each protein sequence is 
recognized and cleaved by the HIV-l protease. 

Primer pair 1 encodes a protein sequence which 
duplicates part of the cleavage site found between the 
membrane associated protein and the caps id protein. 
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Primer pair 2 encodes a protein sequence which 
duplicates part of the cleavage site between the capsid and 
the nucleocapsid protein. 

Primer pair 3 encodes a protein sequence which 
5 duplicates part of the cleavage site between the protease and 
the p6 protein. Like the protease, p6 is a portion of the 
large protein produced by HIV. 

Primer pair 4 encodes a protein sequence which should 
be cleaved by the protease. It was created by examining 
10 several protein sequences which are recognized by the HIV 
protease and using the common residues from each sequence. 
Glycine residues were added to each end to make the molecule 
more flexible. 

The mutagenic cassettes were ligated with the 
15 BamHI /BstBl fragment from plasmid pYS 5 and the PpuMI-jBaml-II 
fragment from plasmid pYS6. Plasmids shown to have correct 
restriction maps were transformed into the E. coli dam" dcm~ 
strain GM2163 (available from New England Bio-Labs, Beverly, 
MA) . Unmethylated plasmid DNA was purified from each mutant 
20 and used to transform B. anthracis. For methods, see Kl impel, 
et al. Proc. Natl. Acad. Sci. 89:10277-10281 (1992). pYS5 
and pYS6 construction are described in Singh, et al . J. Bio. 
Chem. 264:19103-19107 (1989). 

The nucleotide and amino acid sequence of the mature 
25 PA protein after alteration with primer set 2 are shown below. 
Nucleotides residues 482 to 523 were replaced with cassette 2 
resulting in replacement of amino acid residues 162-171 of PA 
with residues NTATIMMQRGNFLQ , PAHIV#2 . The altered DNA 
sequence and the new amino acid residues are underlined. 



30 
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Sequence Range: 1 to 2220 



GAA GTT AAA CAG GAG AAC CGG TTATTAAAT GAA TCAGAA TCAAGT TCC CAG GGG TTACTA 
CTT CAATTT GTC CTC TTG GCC AATAATTTA CTT AGT CTT AGTTCAAGGGTC CCC AATGAT 
Glu ValLys Gin Glu Asn Arg LeuLeuAsn Glu SerGlu Ser Ser SerGlnGly LeuLeu> 



GGA TACTAT TTT AGT GAT TTG AATTTTCAAGCA CCC ATG GTGGTTACCTCTTCT ACTACA 
CCT ATG ATA AAA TCA CTAAAC TTAAAAGTT CGT GGGTAC CAC CAATGG AGA AGA TGATGT 
GlyTyrTyr PheSerAsp Leu AsnPheGlnAla ProMet Valval Thr Ser Ser ThrThr> 



GGG GATTTATCT ATT CCT AGT TCT GAGTTA GAA AAT ATT CCATCG GAAAAC CAA TATTTT 
CCC CTAAAT AGA TAA GGA TCA AGACTCAAT CTT TTATAA GGTAGC CTT TTG GTT ATAAAA 
GlyAspLeuSerlle Pro Ser SerGluLeuGlu Asnlle ProSerGluAsnGlnTyrPhe> 



CAA TCTGCT ATT TGG TCA GGA TTTATCAAAGTT AAGAAG AGT GAT GAATAT ACA TTTGCT 
GTT AGACGA TAA ACC AGT CCT AAATAGTTT CAA TTCTTC TCACTA CTT ATA TGT AAACGA 
GlnSerAlalleTrpSerGlyPhelleLysValLysLys SerAspGluTyrThr PheAla> 



ACT TCCGCT GAT AAT CAT GTA ACAATGTGG GTA GAT G AC CAAGAAGTGATT AAT AAAGCT 
TGA AGGCGA CTATTA GTA CAT TGTTAC ACC CAT CTACTG GTT CTT CAC TAA TTA TTTCGA 
Thr SerAla Asp Aan Hi s Val ThrMetTrp Val AspAsp GlnGlu Val lie Asn LysAla> 



TCT AATTCT AAC AAA ATC AGA TTAGAAAAA. GGA AGATTA TAT CAA ATAAAA ATT CAATAT 
AGA TTAAGA TTG TTT TAG TCT AATCTTTTT CCT TCT AAT ATA GTT TATTTT TAA GTT ATA 
SerAsnSerAsnLys lie ArgLeuGluLysGlyArgLeu TyrGlnlleLys He GlnTyr> 



CAA CGAGAA AAT CCT ACT GAA AAAGGATTG GAT TTCAAG TTG TAC TGG ACC GAT TCT CAA 
GTT GCTCTT TTA GGA TGA CTT TTTCCTAAC CTA AAGTTC AACATGACCTGG CTA AGAGTT 
GlnArgGluAsnProThrGluLysGlyLeuAsp PheLys LeuTyrTrpThr Asp SerGln> 



AAT AAAAAR GAA GTG ATT TCT AGTGATAAC TTA CAATTG CCAGAATTAAAA CAA AAATCT 
TTA TTTTTT CTT CAC TAA AGATCACTATTG AAT GTT AAC GGT CTT AAT TTT GTT TTT AGA 
Asn Lys Ly s Glu Val I le Ser Ser AspAsn Leu GlnLeu Pro Glu LeuLys Gin Ly s Ser > 



TCGAAC ACT GC C ACT ATC ATG ATG CAA CGT GGT AAT TTTCTG CAG G GA CCTACG GTT CCA 
AGCTTGTGACGGTGATAGTACTACGTTGCACCATTAAAAGACGTC CCT GGATGC CAAGGT 
Ser Asn Thr AlaThr I leMet Met Gin ArqGlv Asn PheLeuGln Gly ProThr Val Pro 



GAC CGTGAC AAT GAT GGA ATC CCTGATTCATTA GAGGTA GAAGGATATACG GTT GATGTC 
CTG GCACTG TTA CTA CCT TAG GGACTAAGT AAT CTCCAT CTT CCTATATGC CAA CTACAG 
Asp ArgAsp Asn Asp Gly He ProAspSer LeuGluVal GluGlyTyrThr Val AspVal> 



AAA AAT AAA AGA ACT TTT CTT TCAC CATGG ATT TCT AAT ATT CAT GAAAAG AAA GGATTA 
TTT TTATTT TCT TGA AAA GAA AGTGGT ACC TAA AGATTA TAAGTA CTTTTC TTT CCT AAT 
Lys AsnLys Arg Thr Phe Leu SerProTrp He SerAsn IleHi s GluLys Lys GlyLeu> 
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ACC AAATAT AAA TCA TCT CCT GAAAAATGG AGC ACGGCT TCTGAT CCGTAC AGT GATTTC 
TGG TTTATA TTT AGT AGA GGA CTTTTTACC TCG TGC CGA AGACTAGGC ATG TCA CTAAAG 
5 Thr LysTyr Lys Ser Ser Pro GluLysTrp Ser ThrAla Ser Asp ProTyr Ser AspPhe > 



GAA AAGGTT ACA GGA CGGATT GATAAGAATGTA TCACCA GAG GCA AGACAC CCC CTTGTG 
CTT TTC CAA TGT CCT GCC TAA CTATTCTTACAT AGTGGT CTC CGT TCTGTG GGG GAACAC 
Glu LysVal ThrGlyArg lie AspLysAsnVal SerProGluAlaArgHis ProLeuVal> 



GCA GCTTAT CCG ATT GTA CAT GTAGATATG GAG AAT ATT ATT CTC TCAAAA AAT GAGGAT 
CGT CGAATA GGC TAA CAT GTA CATCTATAC CTC TTATAA TAAGAG AGTTTT TTA CTCCTA 
AlaAlaTyrProIleValHisValAspMetGluAsnIleIleLeuSerLysAsnGluAsp> 



CAA TCC ACA CAG AAT ACT GAT AGTGAAACG AGA ACAATA AGT AAA AAT ACT TCT ACAAGT 
GTT AGGTGT GTC TTA TGA CTA TCACTTTGC TCT TGTTAT TCATTT TTATGA AGA TGTTCA 
GlnSerThrGlnAsnThrAspSerGluThrArgThrlle SerLysAsnThrSerThrSer> 



AGG ACACAT ACT AGT GAA GTA CATGGAAAT GCA GAAGTG CATGCG TCG TTC TTT GATATT 
TCC TGTGTA TGA TCA GTT CAT GTACCTTTA CGT CTTCAC GTACGC AGC AAG AAA CTATAA 
Arg ThrHi s Thr Ser Glu Val Hi sGlyAsn Al a Glu Val Hi s Ala Ser Phe Phe Asp 1 1 e > 



GGT GGG AGT GTA TCT GCA GGA TTT AGT AAT TCG AAT TCA AGT ACG GTC GCA ATT GATCAT 
CCA CCCTCA CAT AGA CGT CCT AAATCATTA AGC TTAAGT TCATGC CAG CGT TAA CTAGTA 
Gly GlySerVal Ser AlaGlyPheSerAsnSer AsnSer SerThrValAlalle AspHis> 



TCA CTATCT CTA GCA GGG GAA AGAACTTGG GCT GAAACAATGGGTTTAAATACC GCTGAT 
AGT GATAGA GAT CGT CCC CTT TCTTGAACC CGA CTT TGT TAC CCAAATTTATGG CGA CTA 
Ser LeuSer Leu Ala Gly Glu ArgThrTrp Ala GluThr MetGly LeuAsnThr AlaAsp> 



ACA GCAAGA TTA AAT GCC AAT ATTAGATAT GTA AAT ACT GGG ACG GCT CCA ATC TAC AAC 
TGT CGTTCT AAT TTA CGG TTA TAATCTATA CAT TTATGA CCC TGC CGAGGT TAG ATGTTG 
Thr Al aArg Leu Asn Al a Asn 1 1 e ArgTyr Val AsnThr Gly Thr Al a Pro lie Tyr Asn > 



GTG TTACCAACG ACTTCGTTAGTGTTAGGAAAAAATCAAACACTCGCGACAATTAAAGCT 
CAC AATGGT TGC TGA AGC AAT CACAATCCT TTT TTAGTT TGT GAG CGC TGT TAA TTT CGA 
Val LeuProThrThr Ser Leu ValLeuGlyLys AsnGlnThrLeuAlaThr lie LysAla> 



AAG GAAAAC CAA TTAAGT CAAATACTTGCACCTAATAATTATTATCCTTCT AAA AACTTG 
TTC CTTTTG GTT AAT TCA GTT TATGAACGT GGA TTA TTA ATAATAGGAAGATTT TTGAAC 
Lys GluAsn Gin Leu Ser Gin IleLeuAlaProAsnAsnTyrTyrProSerLys AsnLeu> 



GCG CCAATC GC^TTAAATGCACAAGACGATTTC AGTTCTACTCCAATTACAATGAATTAC 
CGC GGTTAG CGT AAT TTA CGT GTTCTGCTAAAG TCAAGA TGA GGT TAATGT TAC TTAATG 
65 AlaProIleAlaLeuAsnAlaGlnAspAspPheSerSerThrProIleThrMetAsnTyr> 
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GGG AAT ATA GCAACA TAC AAT TTTGAAAAT GGA AGAGTG AGG GTG GATACA GGC TCGAAC 
CCC TTATAT GCT TGT ATG TTA AAACTTTTA CCT TCT CAC TCC CAC CTATGT CCG AGCTTG 
5 GlyAsnlleAlaThr Tyr Asn Phe GluAsn Gly Arg Val ArgValAspThrGlySe r Asn 

1500 

TGG AGTGAAGTG TTA CCG CAAATTCAAGAAACA ACTGCA CGTATC ATTTTT AAT GGAAAA 
10 ACC TCACTT CAC AAT GGC GTT TAAGTTCTT TGT TGA CGT GCATAG TAAAAA TTA CCTTTT 

Trp SerGlu Val Leu ProGln IleGlnGluThr ThrAlaArglle IlePhe Asn GlyLys 



GAT TTAAAT CTG GTAGAAAGG CGGATAGCG GCG GTT AAT CCT AGT GAT CCA TTA GAAACG 
CTA AATTTAGAC CAT CTT TCC GCCTATCGC CGC CAATTA GGATCA CTAGGT AAT CTTTGC 
Asp LeuAsnLeuValGluArgArglleAlaAlaValAsn ProSerAspProLeuGluThr 



ACT AAACCG GAT ATG ACATTA AAAGAAGCC CTT AAAATA GCATTT GGATTTAAC GAACCG 
TGA TTTGGC CTA TAC TGT AAT TTTCTTCGG GAA TTTTAT CGT AAA CCT AAA TTG CTT GGC 
ThrLysProAspMetThrLeuLysGluAlaLeuLysIleAlaPheGlyPheAsnGluPro 



AAT GGAAAC TTA CAA TAT CAA GGG AAAGAC ATA ACC GAA TTT GAT TTT AAT TTC GATCAA 
TTA CCTTTG AAT GTT ATA GTT CCCTTTCTG TAT TGG CTT AAACTA AAATTA AAG CTAGTT 
Asn GlyAsn Leu Gin Tyr Gin GlyLysAsp He Thr Glu Phe Asp PheAsn Phe AspGln 



CAAACATCT CAA AAT ATC AAG AAT CAGTTA GCG GAA TTA AAC G CA ACT AAC ATA TAT ACT 
GTT TGTAGA GTT TTA TAG TTC TTAGTC AAT CGC CTT AAT TTG CGT TGATTG TAT ATATGA 
GlnThrSerGlnAsnlleLysAsnGlnLeuAlaGluLeuAsnAlaThrAsnlleTyrThr 



GTA TTAGAT AAA ATC AAA TTA AATGCAAAAATG AATATT TTAATA AGAGAT AAA CGTTTT 
CAT AATCTATTT TAG TTT AAT TTACGTTTT TAC TTATAA AAT TAT TCT CTA TTT GCAAAA 
Val LeuAsp Lys He Lys LeuAsnAlaLysMet Asnlle LeuIleArgAspLys ArgPhe 



CAT TATGAT AGA AAT AAC ATA GCAGTTGGG GCG GATGAG TCAGTAGTTAAG GAG GCT CAT 
GTA ATACTA TCT TTA TTG TAT CGTCAACCC CGC CTACTC AGT CAT CAATTC CTC CGAGTA 
HisTyrAspArgAsnAsnlleAlaValGlyAlaAspGluSerValValLysGluAlaHis 



AGA GAAGTA ATT AAT TCG TCA ACAGAGGGATTA TTG TTA AATATT GAT AAG GAT ATAAGA 
TCT CTT CAT TAATTA AGC AGT TGTCTCCCTAAT AAC AAT TTATAA CTATTC CTATATTCT 
Arg GluVal He Asn Ser Ser Thr GluGly Leu LeuLeu Asn I le Asp Lys Asp I le Arg 



AAA ATATTA TCA GGT TAT ATT GTAGAAATT GAA GAT ACT GAA GGG CTT AAA GAA GTT ATA 
TTT TATAAT AGT CCA ATA TAA CATCTTTAA CTT CTATGA CTTCCC GAATTT CTT CAATAT 
Lys IleLeu Ser Gly Tyr lie ValGluIle Glu AspThr GluGly LeuLys Glu Val He 



AAT GACAGA TAT GAT ATG TTG AATATTTCT AGT TTACGG CAA GAT GGAAAA ACA TTTATA 
TTA CTGTCT ATA CTA TAC AAC TTATAAAGATCA AATGCC GTT CTA CCTTTT TGT AAATAT 
65 Asn AspArg Tyr Asp Met Leu AsnlleSer Ser LeuArg GlnAsp GlyLys Thr Phelle 
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2100 

GAT TTTAAA AAA TAT AAT GAT AAATTACCG TTA TAT ATA AGTAAT CCCAAT TAT AAGGTA 
CTA AAATTT TTT ATA TTA CTA TTTAATGGC AAT ATATAT TCATTA GGG TTA ATA TTC CAT 
5 AspPheLysLysTyrAsnAspLysLeuProLeuTyrlleSerAsnProAsnTyrLysVal 

2160 

AAT GTATAT GCT GTT ACT AAA GAAAAC ACT ATT ATT AAT CCTAGT GAG AAT GGG GATACT 
1 0 TTA CAT ATA CGA CAA TGATTT CTTTTGTGATAA TAATTA GGATCACTCTTA CCC CTATGA 

Asn ValTyr AlaVal ThrLysGluAsnThrlle IleAsn ProSerGluAsnGlyAspThr 

2220 

1 5 AGT ACCAAC GGG ATC AAG AAA ATTTTAATC TTT TCTAAA AAAGGC TATGAG ATA GGATAA 

TCA TGGTTG CCC TAG TTC TTT TAAAATTAG AAA AGATTT TTT CCG ATACTC TAT CCTATT 
Ser ThrAsn Gly lie Lys Lys IleLeuIle Phe SerLys LysGlyTyrGlu lie Gly*** 

20 The above procedure was followed for PAHIVtfl, 3 and 4. 

Example 7: Cleavage of Mutant PAHIV Proteins in vitro . 

The mutated proteins were treated with purified HIV-1 
protease and evaluated for their degree of cleavage with 

25 respect to time. The purified protease was obtained from the 
NIH AIDS Research and Reference Reagent Program, Division of 
AIDS, NIAID, Bethesda, MD. Alternatively, the protease can be 
purified following the method of Louis, et al . , Euro. J. 
Biochem., 199:361 (1991). 

3 0 Extended incubation (12 hours) of PA or the mutated PA 

proteins with the purified HIV-1 protease resulted in the 
appearance of two additional protein fragments that were not 
anticipated. These two fragments are approximately 53 
kilodaltons and 3 0 kilodaltons in size. This may represent 

3 5 cleavage of PA and mutant PA proteins at a site recognized by 
the HIV-l protease between PA residues Y 259 and p 260 . The 
residues around this cleavage site, 256 VAAYPIVHV 264 , have not 
previously been identified as a potential HIV-1 protease 
cleavage site. 

40 Incubation of RAW 264.7 cells (ATCC No. TIB 71) with 

lethal factor (LF) and HIV-1 protease -cleaved PAHIV#1 or 
PAHIV#4 caused cell death, demonstrating that the mutated PA 
proteins are capable of binding to LF and thus the toxic LF/PE 
fusion proteins. PAHIV, PAHIV#2 and PAHIV#3 have not yet been 

45 tested. 
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Example 8 : Evaluation of cytotoxic agents in cell c ultures. 

The ability of the PA constructs containing the HIV-1- 
protease cleavage site to promote killing of HIV-1 infected 
cells is being evaluated in COS-l cells (ATCC No. CRL 1650) 
5 transfected with the vector HIV-gpt. When COS cells are 
transfected with this plasmid vector they express all the 
genes for the production of HIV-1 virus particles except the 
envelope protein, gpl60 (Page, K.A. , et al . , 1990. J. Virol. 
64:5270-5276). Without the envelope protein the particles are 

10 not infectious. These cells express the HIV-1 proteases and 

properly cleave the viral protein gp55 to gp24 (Page, K.A. , et 
al., 1990. J". Virol. 64:5270-5276). These properties make the 
transfected cells an excellent model system in which to 
evaluate the ability of protein constructs of the invention to 

15 eliminate HIV-1 infected cells from culture. 

The COS-l cells were transfected with the plasmid 
vector and the resulting cultures are being selected for 
stable transfectents . The mutated PA proteins (PAHIV#1, 
PAHIV#2, PAHIV#3 and PAHIV#4) are added to the culture media 

20 of growing HIV-gpt transfected COS-l cells in the presence of 
the lethal factor fusion protein FP53 (Arora, N. et al. J. 
Biol. Chem. 267:15542 (1992)). Only cells which properly 
cleave the mutated PA proteins are able to bind the toxin LF 
fusion protein. The cultures are evaluated for protein 

25 expression (an indirect measure of viability) after 36 hours 
(Arora, N. and S. H. Leppla. 1992. «J. Biol. Chem. 268 : 3334) . 

Example 9 : Treatment of an HIV-1 infe cted patient. 

A human patient who is infected with HIV-1 is selected 

30 for treatment. Although infected, this particular patient is 
asymptomatic. The patient weighs 70 kilograms. A dose of 10 
micrograms per kilogram or 700 micrograms of a PAHIV in normal 
saline is prepared. This dosage is injected into the patient 
intravenously as a bolus. The dose is repeated weekly for a 

35 total of 4 to 6 dosages. The patient is evaluated regularly, 
such as weekly, in terms of his symptoms, physical exam and 
laboratory analysis according to the clinician's judgment. 
Tests of particular interest include the patient's complete 
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blood count and examination for the presence of HIV infection. 
The treatment regimen can be repeated with or without 
alterations at the discretion of the clinician. 
Incorporated by reference/paragraph before claims 

Unless defined otherwise, all technical and scientific 
terms used herein have the same meaning as commonly understood 
by one of ordinary skill in the art to which this invention 
belongs. Although any methods and materials similar or 
equivalent to those described can be used in the practice or 
testing of the present invention, the preferred methods and 
materials are now described. All publications and patent 
documents referenced in this application are incorporated 
herein by reference. 

It is understood that the examples and embodiments 
described herein are for illustrative purposes only and that 
various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included 
within the spirit and purview of this application and scope of 
the appended claims. 



WO 94/18332 



PCT/US94/01624 



SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: Leppla, Stephen H. 

Kl impel, Kurt R. 
Arora, Naveen 
Singh, Yogendra 
Nichols, Peter J. 



(iii) NUMBER OF SEQUENCES : 31 



(A) ADDRESSEE: TOWNS END and TOWNSEND KHOURIE and CREW 

(B) STREET: Steuart Street Tower, 20th Floor, One Market 

Plaza 

(C) CITY: San Francisco 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP : 94105 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: June 25, 1993 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Weber, Kenneth A. 

(B) REGISTRATION NUMBER: 31,677 

(C) REFERENCE /DOCKET NUMBER: 15280-115 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 543-9600 

(B) TELEFAX: (415) 543-5043 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3291 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(vi) ORIGINAL ! 

(A) ORGANISM: Bacillus anthracis 

( ix) FEATURE : 

(A) NAME /KEY: CDS 

(B) LOCATION: 580.. 2907 

(D) OTHER INFORMATION: /product= "Lethal Factor" 



WO 94/18332 



PCT/US94/01624 



(xi) SI 


SQUENCE DESCRIPTION: SI 


5Q ID NO:l: 






AAATTAGGAT 


TTCGGTTATG 


TTTAGTATTT 


TTTTAAAATA ATAGTATTAA 


ATAGTGGAAT 


60 


GCAAATGATA 


AATGGGCTTT 


AAACAAAACT 


AATGAAATAA TCTACAAATG 


GAATTTCTCC 


120 


AGTTTTAGAT 


TAAACCATAC 


CAAAAAAATC 


ACACTGTCAA GAAAAATGAT 


AGAATCCCTA 


180 


CACTAATTAA 


CATAACCAAA 


TTGGTAGTTA 


TAGGTAGAAA CTTATTTATT 


TCTATAATAC 


240 


CATGCAAAAA 


AGTAAATATT 


CTGTTCCATA 


CTATTTTAGT AAATTATTTA 


GCAAGTAAAT 


300 


TTTGGTGTAT 


AAACAAAGTT 




TAAAAAATTA CTTTACTTTT 


ATACAGATTA 


360 


AAATGAAAAA 


TTTTTTATGA 


CAAGAAATAT 


TGCCTTTAAT TTATGAGGAA 


ATAAGTAAAA 


420 


TTTTCTACAT 


ACTTTATTTT 


ATTGTTGAAA 


TGTTCACTTA TAAAAAAGGA 


GAGATTAAAT 


480 


ATGAATATAA 


AAAAAGAATT 


TATAAAAGTA 


ATTAGTATGT CATGTTTAGT 


AACAGCAATT 


540 


ACTTTGAGTG 


GTCCCGTCTT 


TATCCCCCTT 


GTACAGGGG GCG GGC GGT 
Ala Gly Gly 


CAT GGT 
His Gly 


594 



GAT GTA GGT ATG CAC GTA AAA GAG AAA GAG AAA AAT AAA GAT GAG AAT 
Asp Val Gly Met His Val Lys Glu Lys Glu Lys Asn Lys Asp Glu Asn 
10 15 20 

AAG AGA AAA GAT GAA GAA CGA AAT AAA ACA CAG GAA GAG CAT TTA AAG 
Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin Glu Glu His Leu Lys 
25 30 35 

GAA ATC ATG AAA CAC ATT GTA AAA ATA GAA GTA AAA GGG GAG GAA GCT 
Glu He Met Lys His He Val Lys He Glu Val Lys Gly Glu Glu Ala 
40 45 50 

GTT AAA AAA GAG GCA GCA GAA AAG CTA CTT GAG AAA GTA CCA TCT GAT 
Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu Lys Val Pro Ser Asp 
55 60 65 

GTT TTA GAG ATG TAT AAA GCA ATT GGA GGA AAG ATA TAT ATT GTG GAT 
Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys He Tyr He Val Asp 
70 75 80 85 

GGT GAT ATT ACA AAA CAT ATA TCT TTA GAA GCA TTA TCT GAA GAT AAG 
Gly Asp He Thr Lys His He Ser Leu Glu Ala Leu Ser Glu Asp Lys 
90 95 100 

AAA AAA ATA AAA GAC ATT TAT GGG AAA GAT GCT TTA TTA CAT GAA CAT 
Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala Leu Leu His Glu His 
105 * 110 115 

TAT GTA TAT GCA AAA GAA GGA TAT GAA CCC GTA CTT GTA ATC CAA TCT 
Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val Leu Val He Gin Ser 
120 125 130 

TCG GAA GAT TAT GTA GAA AAT ACT GAA AAG GCA CTG AAC GTT TAT TAT 
Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala Leu Asn Val Tyr Tyr 
135 140 145 

GAA ATA GGT AAG ATA TTA TCA AGG GAT ATT TTA AGT AAA ATT AAT CAA 
Glu He Gly Lys He Leu Ser Arg Asp He Leu Ser Lys He Asn Gin 
150 155 160 165 

CCA TAT CAG AAA TTT TTA GAT GTA TTA AAT ACC ATT AAA AAT GCA TCT 
Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr He Lys Asn Ala Ser 
170 175 180 

GAT TCA GAT GGA CAA GAT CTT TTA TTT ACT AAT CAG CTT AAG GAA CAT 
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Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn Gin Leu Lys Glu His 
185 190 195 

CCC ACA GAC TTT TCT GTA GAA TTC TTG GAA CAA AAT AGC AAT GAG GTA 
Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin Asn Ser Asn Glu Val 
200 205 210 

CAA GAA GTA TTT GCG AAA GCT TTT GCA TAT TAT ATC GAG CCA CAG CAT 
Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr He Glu Pro Gin His 
215 220 225 

CGT GAT GTT TTA CAG CTT TAT GCA CCG GAA GCT TTT AAT TAC ATG GAT 
Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala Phe Asn Tyr Met Asp 
230 235 240 245 

AAA TTT AAC GAA CAA GAA ATA AAT CTA TCC TTG GAA GAA CTT AAA GAT 
Lys Phe Asn Glu Gin Glu He Asn Leu Ser Leu Glu Glu Leu Lys Asp 
250 255 260 

CAA CGG ATG CTG TCA AGA TAT GAA AAA TGG GAA AAG ATA AAA CAG CAC 
Gin Arg Met Leu Ser Arg Tyr Glu Lys Trp Glu Lys He Lys Gin His 
265 " 270 275 

TAT CAA CAC TGG AGC GAT TCT TTA TCT GAA GAA GGA AGA GGA CTT TTA 
Tyr Gin His Trp Ser Asp Ser Leu Ser Glu Glu Gly Arg Gly Leu Leu 
280 285 290 

AAA AAG CTG CAG ATT CCT ATT GAG CCA AAG AAA GAT GAC ATA ATT CAT 
Lys Lys Leu Gin He Pro He Glu Pro Lys Lys Asp Asp He He His 
295 300 305 

TCT TTA TCT CAA GAA GAA AAA GAG CTT CTA AAA AGA ATA CAA ATT GAT 
Ser Leu Ser Gin Glu Glu Lys Glu Leu Leu Lys Arg He Gin He Asp 
310 315 320 325 

AGT AGT GAT TTT TTA TCT ACT GAG GAA AAA GAG TTT TTA AAA AAG CTA 
Ser Ser Asp Phe Leu Ser Thr Glu Glu Lys Glu Phe Leu Lys Lys Leu 
330 335 340 

CAA ATT GAT ATT CGT GAT TCT TTA TCT GAA GAA GAA AAA GAG CTT TTA 
Gin He Asp He Arg Asp Ser Leu Ser Glu Glu Glu Lys Glu Leu Leu 
345 350 355 

AAT AGA ATA CAG GTG GAT AGT AGT AAT CCT TTA TCT GAA AAA GAA AAA 
Asn Arg He Gin Val Asp Ser Ser Asn Pro Leu Ser Glu Lys Glu Lys 
360 365 370 

GAG TTT TTA AAA AAG CTG AAA CTT GAT ATT CAA CCA TAT GAT ATT AAT 
Glu Phe Leu Lys Lys Leu Lys Leu Asp He Gin Pro Tyr Asp He Asn 
375 380 385 

CAA AGG TTG CAA GAT ACA GGA GGG TTA ATT GAT AGT CCG TCA ATT AAT 
Gin Arg Leu Gin Asp Thr Gly Gly Leu He Asp Ser Pro Ser He Asn 
330 395 400 405 

CTT GAT GTA AGA AAG CAG TAT AAA AGG GAT ATT CAA AAT ATT GAT GCT 
Leu Asp Val Arg Lys Gin Tyr Lys Arg Asp He Gin Asn He Asp Ala 
410 415 420 

TTA TTA CAT CAA TCC ATT GGA AGT ACC TTG TAC AAT AAA ATT TAT TTG 
Leu Leu His Gin Ser He Gly Ser Thr Leu Tyr Asn Lys He Tyr Leu 
425 430 435 

TAT GAA AAT ATG AAT ATC AAT AAC CTT ACA GCA ACC CTA GGT GCG GAT 
Tyr Glu Asn Met Asn He Asn Asn Leu Thr Ala Thr Leu Gly Ala Asp 
440 445 450 

TTA GTT GAT TCC ACT GAT AAT ACT AAA ATT AAT AGA GGT ATT TTC AAT 
Leu Val Asp Ser Thr Asp Asn Thr Lys He Asn Arg Gly He Phe Asn 
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GAA TTC AAA AAA AAT TTC AAA TAT AGT ATT TCT AGT AAC TAT ATG ATT 
Glu Phe Lys Lys Asn Phe Lys Tyr Ser lie Ser Ser Asn Tyr Met He 
470 475 480 485 

GTT GAT ATA AAT GAA AGG CCT GCA TTA GAT AAT GAG CGT TTG AAA TGG 
Val Asp He Asn Glu Arg Pro Ala Leu Asp Asn Glu Arg Leu Lys Trp 
490 495 500 

AGA ATC GAA TTA TCA CCA GAT ACT CGA GCA GGA TAT TTA GAA AAT GGA 
Arg He Gin Leu Ser Pro Asp Thr Arg Ala Gly Tyr Leu Glu Asn Gly 
505 510 515 

AAG CTT ATA TTA CAA AGA AAC ATC GGT CTG GAA ATA AAG GAT GTA CAA 
Lys Leu He Leu Gin Arg Asn He Gly Leu Glu He Lys Asp Val Gin 
520 525 530 

ATA ATT AAG CAA TCC GAA AAA GAA TAT ATA AGG ATT GAT GCG AAA GTA 
He He Lys Gin Ser Glu Lys Glu Tyr He Arg He Asp Ala Lys Val 
535 540 545 

GTG CCA AAG AGT AAA ATA GAT ACA AAA ATT CAA GAA GCA CAG TTA AAT 
Val Pro Lys Ser Lys He Asp Thr Lys He Gin Glu Ala Gin Leu Asn 
550 555 560 565 

ATA AAT CAG GAA TGG AAT AAA GCA TTA GGG TTA CCA AAA TAT ACA AAG 
He Asn Gin Glu Trp Asn Lys Ala Leu Gly Leu Pro Lys Tyr Thr Lys 
570 575 580 

CTT ATT ACA TTC AAC GTG CAT AAT AGA TAT GCA TCC AAT ATT GTA GAA 
Leu He Thr Phe Asn Val His Asn Arg Tyr Ala Ser Asn He Val Glu 
585 590 595 

AGT GCT TAT TTA ATA TTG AAT GAA TGG AAA AAT AAT ATT CAA AGT GAT 
Ser Ala Tyr Leu He Leu Asn Glu Trp Lys Asn Asn He Gin Ser Asp 
600 605 610 

CTT ATA AAA AAG GTA ACA AAT TAC TTA GTT GAT GGT AAT GGA AGA TTT 
Leu He Lys Lys Val Thr Asn Tyr Leu Val Asp Gly Asn Gly Arg Phe 
615 620 625 

GTT TTT ACC GAT ATT ACT CTC CCT AAT ATA GCT GAA CAA TAT ACA CAT 
Val Phe Thr Asp He Thr Leu Pro Asn He Ala Glu Gin Tyr Thr His 
630 635 640 645 

CAA GAT GAG ATA TAT GAG CAA GTT CAT TCA AAA GGG TTA TAT GTT CCA 
Gin Asp Glu He Tyr Glu Gin Val His Ser Lys Gly Leu Tyr Val Pro 
650 655 660 

GAA TCC CGT TCT ATA TTA CTC CAT GGA CCT TCA AAA GGT GTA GAA TTA 
Glu Ser Arg Ser He Leu Leu His Gly Pro Ser Lys Gly Val Glu Leu 
665 670 675 

AGG AAT GAT AGT GAG GGT TTT ATA CAC GAA TTT GGA CAT GCT GTG GAT 
Arg Asn Asp Ser Glu Gly Phe He His Glu Phe Gly His Ala Val Asp 
680 685 690 

GAT TAT GCT GGA TAT CTA TTA GAT AAG AAC CAA TCT GAT TTA GTT ACA 
Asp Tyr Ala Gly Tyr Leu Leu Asp Lys Asn Gin Ser Asp Leu Val Thr 
695 700 705 

AAT TCT AAA AAA TTC ATT GAT ATT TTT AAG GAA GAA GGG AGT AAT TTA 
Asn Ser Lys Lys Phe He Asp He Phe Lys Glu Glu Gly Ser Asn Leu 
710 715 720 725 

ACT TCG TAT GGG AGA ACA AAT GAA GCG GAA TTT TTT GCA GAA GCC TTT 
Thr Ser Tyr Gly Arg Thr Asn Glu Ala Glu Phe Phe Ala Glu Ala Phe 
730 735 740 
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AGG TTA ATG CAT TCT ACG GAC CAT GCT GAA CGT TTA AAA GTT CAA AAA 2850 
Arg Leu Met His Ser Thr Asp His Ala Glu Arg Leu Lys Val Gin Lys 
745 750 755 

AAT GCT CCG AAA ACT TTC CAA TTT ATT AAC GAT CAG ATT AAG TTC ATT 2898 
Asn Ala Pro Lys Thr Phe Gin Phe lie Asn Asp Gin lie Lys Phe lie 
760 765 770 

ATT AAC TCA TAAGTAATGT ATTAAAAATT TTCAAATGGA TTTAATAATA 2947 
lie Asn Ser 
775 

ATAATAATAA TAATAATAAC GGGACCAGCC ATTATGAAGC AACTAATTCT AGACTTGATA 3007 

GTAATTCTTG GGAAGCACCA GATAGTGTAA AAGGTGGCAT TGCCAGAATG ATATTTTATG 3067 

TGTTCGTTAG ATATGAAGGC AAAAACAATG ATCCTGACCT AGAACTTAAT GATAATGTTA 3127 

TTAATAATTT AATGCCTTTT ATAGGAATAT TAGTAAAAGT GCCGAAAAGA TCCTGTTGCA 3187 

AAGCTTTTAA AGAACATATT ATTCTATCAA GTGGCTGTAT ATTTTGTGTA ATTTTCAATA 3247 

AATTTTGTAA TTAAGCATAC GTCAAAAAAC CGAAATCTGA GCTC 3291 

(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 776 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
15 10 15 

Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

Glu Glu His Leu Lys Glu He Met Lys His He Val Lys He Glu Val 
35 40 45 

Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
50 55 60 

Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys 
65 70 75 80 

He Tyr He Val Asp Gly Asp He Thr Lys His He Ser Leu Glu Ala 
85 90 95 
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He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 185 190 

Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 
195 200 205 

Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr 
210 215 220 

He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 

Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn Leu Ser Leu 
245 250 255 

Glu Glu Leu Lys Asp Gin Arg Met Leu Ser Arg Tyr Glu Lys Trp Glu 
260 265 270 

Lys He Lys Gin His Tyr Gin His Trp Ser Asp Ser Leu Ser Glu Glu 
275 280 285 

Gly Arg Gly Leu Leu Lys Lys Leu Gin He Pro He Glu Pro Lys Lys 
290 295 300 

Asp Asp He He His Ser Leu Ser Gin Glu Glu Lys Glu Leu Leu Lys 
305 310 315 320 

Arg He Gin He Asp Ser Ser Asp Phe Leu Ser Thr Glu Glu Lys Glu 
325 330 335 

Phe Leu Lys Lys Leu Gin He Asp He Arg Asp Ser Leu Ser Glu Glu 
340 345 350 

Glu Lys Glu Leu Leu Asn Arg He Gin Val Asp Ser Ser Asn Pro Leu 
355 360 365 

Ser Glu Lys Glu Lys Glu Phe Leu Lys Lys Leu Lys Leu Asp He Gin 
370 ~ 375 380 

Pro Tyr Asp He Asn Gin Arg Leu Gin Asp Thr Gly Gly Leu He Asp 
385 390 395 400 

Ser Pro Ser He Asn Leu Asp Val Arg Lys Gin Tyr Lys Arg Asp He 
405 410 415 

Gin Asn He Asp Ala Leu Leu His Gin Ser He Gly Ser Thr Leu Tyr 
420 425 430 

Asn Lys He Tyr Leu Tyr Glu Asn Met Asn He Asn Asn Leu Thr Ala 
435 440 445 

Thr Leu Gly Ala Asp Leu Val Asp Ser Thr Asp Asn Thr Lys He Asn 
450 455 460 

Arg Gly He Phe Asn Glu Phe Lys Lys Asn Phe Lys Tyr Ser He Ser 
465 470 475 480 

Ser Asn Tyr Met He Val Asp He Asn Glu Arg Pro Ala Leu Asp Asn 
485 490 495 

Glu Arg Leu Lys Trp Arg He Gin Leu Ser Pro Asp Thr Arg Ala Gly 
500 505 510 

Tyr Leu Glu Asn Gly Lys Leu He Leu Gin Arg Asn He Gly Leu Glu 
515 520 525 

He Lys Asp Val Gin He He Lys Gin Ser Glu Lys Glu Tyr He Arg 
530 535 540 
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lie Asp Ala Lys Val Val Pro Lys Ser Lys lie Asp Thr Lys He Gin 
545 550 555 560 

Glu Ala Gin Leu Asn He Asn Gin Glu Trp Asn Lys Ala Leu Gly Leu 
565 570 575 

Pro Lys Tyr Thr Lys Leu He Thr Phe Asn Val His Asn Arg Tyr Ala 
580 585 590 

Ser Asn He Val Glu Ser Ala Tyr Leu He Leu Asn Glu Trp Lys Asn 
595 600 605 

Asn He Gin Ser Asp Leu He Lys Lys Val Thr Asn Tyr Leu Val Asp 
610 615 620 

Gly Asn Gly Arg Phe Val Phe Thr Asp He Thr Leu Pro Asn He Ala 
625 630 635 640 

Glu Gin Tyr Thr His Gin Asp Glu He Tyr Glu Gin Val His Ser Lys 
645 650 655 

Gly Leu Tyr Val Pro Glu Ser Arg Ser He Leu Leu His Gly Pro Ser 
660 665 670 

Lys Gly Val Glu Leu Arg Asn Asp Ser Glu Gly Phe He His Glu Phe 
675 680 685 

Gly His Ala Val Asp Asp Tyr Ala Gly Tyr Leu Leu Asp Lys Asn Gin 
690 695 700 

Ser Asp Leu Val Thr Asn Ser Lys Lys Phe He Asp He Phe Lys Glu 
705 710 715 720 



Phe Ala Glu Ala Phe Arg Leu Met His Ser Thr Asp His Ala Glu Arg 
740 745 750 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4235 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1891.. 4095 

(D) OTHER INFORMATION: /product= "Protective Antigen" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
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AAGCTTCTGT CATTCGTAAA TTTCAAATAG AACGTAAATT TAGACTTCTC ATCATTAAAA 60 

ATGAAAAATC TTATCTTTTT GATTCTATTG TATATTTTTA TTAAGGTGTT TAATAGTTAG 120 

AAAAGACAGT TGATGCTATT ACTCCAGATA AAATATAGCT AACCATAAAT TTATTAAAGA 180 

AACCTTGTTG TTCTAAATAA TGATTTTGTG GATTCCGGAA TAGATACTGG TGAGTTAGCT 240 

CTAATTTTAT AGTGATTTAA CTAACAATTT ATAAAGCAGC ATAATTCAAA TTTTTTAATT 300 

GATTTTTCCT GAAGCATAGT ATAAAAGAGT CAAGGTCTTC TAGACTTGAC TCTTGGAATC 3 60 

ATTAGGAATT AACAATATAT ATAATGCGCT AGACAGAATC AAATTAAATG CAAAAATGAA 420 

TATTTTAGTA AGAGATCCAT ATCATTATGA TAATAACGGT AATATTGTAG GGGTTGATGA 480 

TTCATATTTA AAAAACGCAT ATAAGCAAAT ACTTAATTGG TCAAGCGATG GAGTTTCTTT 54 0 

AAATCTAGAT GAAGATGTAA ATCAAGCACT ATCTGGATAT ATGCTTCAAA TAAAAAAACC 600 

TTCAAACCAC CTAACAAACA GCCCAGTTAC AATTACATTA GCAGGCAAGG ACAGTGGTGT 660 

TGGAGAATTG TATAGAGTAT TATCAGATGG AGCAGGATTC CTGGATTTCA ATAAGTTTGA 720 

TGAAAATTGG CGATCATTAG TAGATCCTGG TGATGATGTT TATGTGTATG CTGTTACTAA 780 

AGAAGATTTT AATGCAGTTA CTCGAGATGA AAATGGTAAT ATAGCGAATA AATTAAAAAA 84 0 

CACCTTAGTT TTATCGGGTA AAATAAAAGA AATAAACATA AAAACTACAA ATATTAATAT 900 

ATTTGTAGTT TTTATGTTTA TTATATACCT CCTATTTTAT ATTATTAGTA GCACAGTTTT 960 

TGCAAATCAT GTAATTGTAT ACTTATCTAT GTAGAGGTAT CACAACTTAT GAATAGTGTA 102 0 

TTTTATTGAA CGTTGGTTAG CTTGGACAGT TGTATGGATA TGCATACTTT ATAACGTATA 1080 

AAATTTCACG CACCACAATA AAACTAATTT AACAAAAACA AAAACACACC TAAGATCATT 1140 

CAGTTCTTTT AATAAGGAGC TGCCCACCAA GCTAAACCTA AATAATCTTT GTTTCACATA 1200 

AGGTTTTTTT CTAAATATAC AGTGTAAGTT ATTGTGAATT TAACCAGTAT ATATTAAAAA 1260 

TGTTTTATGT TAACAAATTA AATTGTAAAA CCCCTCTTAA GCATAGTTAA GAGGGGTAGG 1320 

TTTTAAATTT TTTGTTGAAA TTAGAAAAAA TAATAAAAAA ACAAACCTAT 1TTCTTTCAG 1380 

GTTGTTTTTG GGTTACAAAA CAAAAAGAAA ACATGTTTCA AGGTACAATA ATTATGGTTC 144 0 

TTTAGCTTTC TGTAAAACAG CCTTAATAGT TGGATTTATG ACTATTAAAG TTAGTATACA 1500 

GCATACACAA TCTATTGAAG GATATTTATA ATGCAATTCC CTAAAAATAG TTTTGTATAA 1560 

CCAGTTCTTT TATCCGAACT GATACACGTA TTTTAGCATA ATTTTTAATG TATCTTCAAA 162 0 

AACAGCTTCT GTGTCCTTTT CTATTAAACA TATAAATTCT TTTTTATGTT ATATATTTAT 1680 

AAAAGTTCTG TTTAAAAAGC CAAAAATAAA TAATTATCTC TTTTTATTTA TATTATATTG 174 0 

AAACTAAAGT TTATTAATTT CAATATAATA TAAATTTAAT TTTATACAAA AAGGAGAACG 1800 

TATATGAAAA AACGAAAAGT GTTAATACCA TTAATGGCAT TGTCTACGAT ATTAGTTTCA 1860 

AGCACAGGTA ATTTAGAGGT GATTCAGGCA GAA GTT AAA CAG GAG AAC CGG TTA 1914 
Glu Val Lys Gin Glu Asn Arg Leu 
1 5 

TTA AAT GAA TCA GAA TCA AGT TCC CAG GGG TTA CTA GGA TAC TAT TTT 1962 
Leu Asn Glu Ser Glu Ser Ser Ser Gin Gly Leu Leu Gly Tyr Tyr Phe 
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AGT GAT TTG AAT TTT CAA GCA CCC ATG GTG GTT ACC TCT TCT ACT ACA 
Ser Asp Leu Asn Phe Gin Ala Pro Met Val Val Thr Ser Ser Thr Thr 
25 30 35 40 

GGG GAT TTA TCT ATT CCT AGT TCT GAG TTA GAR. AAT ATT CCA TCG GAA 
Gly Asp Leu Ser He Pro Ser Ser Glu Leu Glu Asn He Pro Ser Glu 
45 50 55 

AAC CAA TAT TTT CAA TCT GCT ATT TGG TCA GGA TTT ATC AAA GTT AAG 
Asn Gin Tyr Phe Gin Ser Ala He Trp Ser Gly Phe He Lys Val Lys 
60 65 70 

AAG AGT GAT GAA TAT ACA TTT GCT ACT TCC GCT GAT AAT CAT GTA ACA 
Lys Ser Asp Glu Tyr Thr Phe Ala Thr Ser Ala Asp Asn His Val Thr 
75 80 85 

ATG TGG GTA GAT GAC CAA GAA GTG ATT AAT AAA GCT TCT AAT TCT AAC 
Met Trp Val Asp Asp Gin Glu Val He Asn Lys Ala Ser Asn Ser Asn 
90 95 100 

AAA ATC AGA TTA GAA AAA GGA AGA TTA TAT CAA ATA AAA ATT CAA TAT 
Lys He Arg Leu Glu Lys Gly Arg Leu Tyr Gin He Lys He Gin Tyr 
105 HO 115 120 

CAA CGA GAA AAT CCT ACT GAA AAA GGA TTG GAT TTC AAG TTG TAC TGG 
Gin Arg Glu Asn Pro Thr Glu Lys Gly Leu Asp Phe Lys Leu Tyr Trp 
125 130 135 

ACC GAT TCT CAA AAT AAA AAA GAA GTG ATT TCT AGT GAT AAC TTA CAA 
Thr Asp Ser Gin Asn Lys Lys Glu Val He Ser Ser Asp Asn Leu Gin 
140 145 150 

TTG CCA GAA TTA AAA CAA AAA TCT TCG AAC TCA AGA AAA AAG CGA AGT 
Leu Pro Glu Leu Lys Gin Lys Ser Ser Asn Ser Arg Lys Lys Arg Ser 
155 160 165 

ACA AGT GCT GGA CCT ACG GTT CCA GAC CGT GAC AAT GAT GGA ATC CCT 
Thr Ser Ala Gly Pro Thr Val Pro Asp Arg Asp Asn Asp Gly He Pro 
170 175 180 

GAT TCA TTA GAG GTA GAA GGA TAT ACG GTT GAT GTC AAA AAT AAA AGA 
Asp Ser Leu Glu Val Glu Gly Tyr Thr Val Asp Val Lys Asn Lys Arg 
185 190 195 200 

ACT TTT CTT TCA CCA TGG ATT TCT AAT ATT CAT GAA AAG AAA GGA TTA 
Thr Phe Leu Ser Pro Trp He Ser Asn He His Glu Lys Lys Gly Leu 
205 210 215 

ACC AAA TAT AAA TCA TCT CCT GAA AAA TGG AGC ACG GCT TCT GAT CCG 
Thr Lys Tyr Lys Ser Ser Pro Glu Lys Trp Ser Thr Ala Ser Asp Pro 
220 225 230 

TAC AGT GAT TTC GAA AAG GTT ACA GGA CGG ATT GAT AAG AAT GTA TCA 
Tyr Ser Asp Phe Glu Lys Val Thr Gly Arg He Asp Lys Asn Val Ser 
235 240 245 

CCA GAG GCA AGA CAC CCC CTT GTG GCA GCT TAT CCG ATT GTA CAT GTA 
Pro Glu Ala Arg His Pro Leu Val Ala Ala Tyr Pro He Val His Val 
250 255 260 

GAT ATG GAG AAT ATT ATT CTC TCA AAA AAT GAG GAT CAA TCC ACA CAG 
Asp Met Glu Asn He He Leu Ser Lys Asn Glu Asp Gin Ser Thr Gin 
265 270 275 280 

AAT ACT GAT AGT GAA ACG AGA ACA ATA AGT AAA AAT ACT TCT ACA AGT 
Asn Thr Asp Ser Glu Thr Arg Thr He Ser Lys Asn Thr Ser Thr Ser 
285 290 295 

AGG ACA CAT ACT AGT GAA GTA CAT GGA AAT GCA GAA GTG CAT GCG TCG 



WO 94/18332 



PCT/US94/01624 



TTC TTT GAT ATT GGT GGG AGT GTA TCT GCA GGA TTT AGT AAT TCG AAT 
5 Phe Phe Asp He Gly Gly Ser Val Ser Ala Gly Phe Ser Asn Ser Asn 
315 320 325 

TCA AGT ACG GTC GCA ATT GAT CAT TCA CTA TCT CTA GCA GGG GAA AGA 
Ser Ser Thr Val Ala He Asp His Ser Leu Ser Leu Ala Gly Glu Arg 
10 330 335 340 

ACT TGG GCT GAA ACA ATG GGT TTA AAT ACC GCT GAT ACA GCA AGA TTA 

Thr Trp Ala Glu Thr Met Gly Leu Asn Thr Ala Asp Thr Ala Arg Leu 

345 350 355 360 

15 

AAT GCC AAT ATT AGA TAT GTA AAT ACT GGG ACG GCT CCA ATC TAC AAC 

Asn Ala Asn He Arg Tyr Val Asn Thr Gly Thr Ala Pro He Tyr Asn 

365 370 375 

20 GTG TTA CCA ACG ACT TCG TTA GTG TTA GGA AAA AAT CAA ACA CTC GCG 

Val Leu Pro Thr Thr Ser Leu Val Leu Gly Lys Asn Gin Thr Leu Ala 
380 385 390 

ACA ATT AAA GCT AAG GAA AAC CAA TTA AGT CAA ATA CTT GCA CCT AAT 
25 Thr He Lys Ala Lys Glu Asn Gin Leu Ser Gin He Leu Ala Pro Asn 
395 400 405 

AAT TAT TAT CCT TCT AAA AAC TTG GCG CCA ATC GCA TTA AAT GCA CAA 
Asn Tyr Tyr Pro Ser Lys Asn Leu Ala Pro He Ala Leu Asn Ala Gin 
30 410 415 420 

GAC GAT TTC AGT TCT ACT CCA ATT ACA ATG AAT TAC AAT CAA TTT CTT 
Asp Asp Phe Ser Ser Thr Pro He Thr Met Asn Tyr Asn Gin Phe Leu 
425 430 435 440 

35 

GAG TTA GAA AAA ACG AAA CAA TTA AGA TTA GAT ACG GAT CAA GTA TAT 
Glu Leu Glu Lys Thr Lys Gin Leu Arg Leu Asp Thr Asp Gin Val Tyr 
445 450 455 

40 GGG AAT ATA GCA ACA TAC AAT TTT GAA AAT GGA AGA GTG AGG GTG GAT 

Gly Asn He Ala Thr Tyr Asn Phe Glu Asn Gly Arg Val Arg Val Asp 
460 465 470 

ACA GGC TCG AAC TGG AGT GAA GTG TTA CCG CAA ATT CAA GAA ACA ACT 
45 Thr Gly Ser Asn Trp Ser Glu Val Leu Pro Gin He Gin Glu Thr Thr 
475 480 485 

GCA CGT ATC ATT TTT AAT GGA AAA GAT TTA AAT CTG GTA GAA AGG CGG 
Ala Arg He He Phe Asn Gly Lys Asp Leu Asn Leu Val Glu Arg Arg 
50 490 495 500 

ATA GCG GCG GTT AAT CCT AGT GAT CCA TTA GAA ACG ACT AAA CCG GAT 
He Ala Ala Val Asn Pro Ser Asp Pro Leu Glu Thr Thr Lys Pro Asp 
505 510 515 520 

55 

ATG ACA TTA AAA GAA GCC CTT AAA ATA GCA TTT GGA TTT AAC GAA CCG 
Met Thr Leu Lys Glu Ala Leu Lys He Ala Phe Gly Phe Asn Glu Pro 
525 530 535 

60 AAT GGA AAC TTA CAA TAT CAA GGG AAA GAC ATA ACC GAA TTT GAT TTT 

Asn Gly Asn Leu Gin Tyr Gin Gly Lys Asp He Thr Glu Phe Asp Phe 
540 545 550 

AAT TTC GAT CAA CAA ACA TCT CAA AAT ATC AAG AAT CAG TTA GCG GAA 
65 Asn Phe Asp Gin Gin Thr Ser Gin Asn He Lys Asn Gin Leu Ala Glu 
555 560 565 

TTA AAC GCA ACT AAC ATA TAT ACT GTA TTA GAT AAA ATC AAA TTA AAT 
Leu Asn Ala Thr Asn He Tyr Thr Val Leu Asp Lys He Lys Leu Asn 
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570 


575 




580 










5 


GCA AAA ATG 
Ala Lys Met 
585 


AAT ATT TTA ATA AGA 
Asn He Leu He Arg 
590 


GAT AAA CGT 
Asp Lys Arg 
595 


TTT 

Phe 


CAT 
His 


TAT 
Tyr 


GAT 
Asp 


AGA 
Arg 
600 


10 


AAT AAC ATA 
Asn Asn He 


GCA GTT GGG GCG GAT 
Ala Val Gly Ala Asp 
605 


GAG TCA GTA 
Glu Ser Val 
610 


GTT 
Val 


AAG 
Lys 


GAG 
Glu 


GCT 
Ala 
615 


CAT 
His 


AGA GAA GTA 
Arg Glu Val 


ATT AAT TCG TCA ACA 
He Asn Ser Ser Thr 
620 


GAG GGA TTA 
Glu Gly Leu 
625 


TTG 
Leu 


TTA 
Leu 


AAT 
Asn 

630 


ATT 
He 


GAT 
Asp 


15 


AAG GAT ATA 
Lys Asp He 
635 


AGA AAA ATA TTA TCA 
Arg Lys He Leu Ser 
640 


GGT TAT ATT 
Gly Tyr He 


GTA 
Val 


GAA 
Glu 
645 


ATT 
He 


GAA 
Glu 


GAT 
Asp 


20 


ACT GAA GGG 
Thr Glu Gly 
650 


CTT AAA GAA GTT ATA 
Leu Lys Glu Val He 
655 


AAT GAC AGA 
Asn Asp Arg 


TAT 
Tyr 
660 


GAT 
Asp 


ATG 
Met 


TTG 
Leu 


AAT 
Asn 


25 


ATT TCT AGT 
He Ser Ser 
665 


TTA CGG CAA GAT GGA 
Leu Arg Gin Asp Gly 
670 


AAA ACA TTT 
Lys Thr Phe 
675 


ATA 
He 


GAT 
Asp 


TTT 
Phe 


AAA 
Lys 


AAA 
Lys 
680 




TAT AAT GAT 
Tyr Asn Asp 


AAA TTA CCG TTA TAT 
Lys Leu Pro Leu Tyr 
685 


ATA AGT AAT 
He Ser Asn 
690 


CCC 
Pro 


AAT 
Asn 


TAT 
Tyr 


AAG 
Lys 
695 


GTA 
Val 


30 


AAT GTA TAT 
Asn Val Tyr 


GCT GTT ACT AAA GAA 
Ala Val Thr Lys Glu 
700 


AAC ACT ATT 

Asn Thr He 
705 


ATT 
He 


AAT 
Asn 


CCT 
Pro 
710 


AGT 
Ser 


GAG 
Glu 


35 


AAT GGG GAT 
Asn Gly Asp 
715 


ACT AGT ACC AAC GGG 
Thr Ser Thr Asn Gly 
720 


ATC AAG AAA 
He Lys Lys 


ATT 
He 


TTA 
Leu 
725 


ATC 
He 


TTT 
Phe 


TCT 
Ser 



AAA AAA GGC TAT GAG ATA GGA TAAGGTAATT CTAGGTGATT TTTAAATTAT 4125 
Lys Lys Gly Tyr Glu He Gly 
730 735 

CTAAAAAACA GTAAAATTAA AACATACTCT TTTTGTAAGA AATACAAGGA GAGTATGTTT 4185 

TAAACAGTAA TCTAAATCAT CATAATCCTT TGAGATTGTT TGTAGGATCC 4235 

(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 735 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

Glu Val Lys Gin Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser 
1 5 10 15 

Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 
20 25 30 

Met Val Val Thr Ser. Ser Thr Thr Gly Asp Leu Ser He Pro Ser Ser 
35 40 45 

Glu Leu Glu Asn He Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala He 
50 55 60 
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82 

Trp Ser Gly Phe lie Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
G5 70 75 80 

Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 
85 90 95 

He Asn Lys Ala Ser Asn Ser Asn Lys He Arg Leu Glu Lys Gly Arg 
100 105 110 

Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
115 120 125 

Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 
130 135 140 



i Arg Asp Asn Asp Gly He Pro Asp Ser Leu Glu Val Glu Gly Tyr 
180 185 190 



Asn He His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro Glu 
210 215 220 



Ala Ala Tyr Pro He Val His Val Asp Met Glu Asn He He Leu Ser 
260 265 270 



Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly Leu 
340 345 350 
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Thr Met Asn Tyr Asn Gin Phe Leu Glu Leu Glu Lys Thr Lys Gin Leu 
435 440 445 

Arg Leu Asp Thr Asp Gin Val Tyr Gly Asn He Ala Thr Tyr Asn Phe 
450 455 460 

Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
465 470 475 480 

Leu Pro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 
485 490 495 

Asp Leu Asn Leu Val Glu Arg Arg He Ala Ala Val Asn Pro Ser Asp 
500 505 510 

Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
515 520 525 

He Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
530 * 535 540 

Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
545 550 555 560 

Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 
565 570 575 

Val Leu Asp Lys He Lys Leu Asn Ala Lys Met Asn He Leu He Arg 
580 585 590 

Asp Lys Arg Phe His Tyr Asp Arg Asn Asn He Ala Val Gly Ala Asp 
595 600 605 

Glu Ser Val Val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 
610 615 620 

Glu Gly Leu Leu Leu Asn He Asp Lys Asp He Arg Lys He Leu Ser 
625 630 635 640 

Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 
645 650 655 

Asn Asp Arg Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
660 665 670 

Lys Thr Phe He Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr 
675 680 685 

He Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 
690 695 700 

Asn Thr He He Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
705 710 715 720 

He Lys Lys He Leu He Phe Ser Lys Lys Gly Tyr Glu He Gly 
725 730 735 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1368 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
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(iv) ANTI- SENSE: 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1..13S8 

(D) OTHER INFORMATION: /product= 

"LF (1-254) - -TR- -PE (401-602) " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

GCG GGC GGT CAT GGT GAT GTA GGT ATG CAC GTA AAA GAG AAA GAG AAA 
Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
15 10 15 

AAT AAA GAT GAG AAT AAG AGA AAA GAT GAA GAA CGA AAT AAA ACA CAG 
Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

GAA GAG CAT TTA AAG GAA ATC ATG AAA CAC ATT GTA AAA ATA GAA GTA 
Glu Glu His Leu Lys Glu He Met Lys His He Val Lys He Glu Val 
35 40 45 

AAA GGG GAG GAA GCT GTT AAA AAA GAG GCA GCA GAA AAG CTA CTT GAG 
Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
50 55 SO 

AAA GTA CCA TCT GAT GTT TTA GAG ATG TAT AAA GCA ATT GGA GGA AAG 
Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys 
65 70 75 80 

ATA TAT ATT GTG GAT GGT GAT ATT ACA AAA CAT ATA TCT TTA GAA GCA 
He Tyr He Val Asp Gly Asp He Thr Lys His He Ser Leu Glu Ala 
85 90 95 

TTA TCT GAA GAT AAG AAA AAA ATA AAA GAC ATT TAT GGG AAA GAT GCT 
Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala 
100 105 110 

TTA TTA CAT GAA CAT TAT GTA TAT GCA AAA GAA GGA TAT GAA CCC GTA 
Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
115 120 125 

CTT GTA ATC CAA TCT TCG GAA GAT TAT GTA GAA AAT ACT GAA AAG GCA 
Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
130 135 140 

CTG AAC GTT TAT TAT GAA ATA GGT AAG ATA TTA TCA AGG GAT ATT TTA 
Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg Asp He Leu 
145 150 155 160 



55 
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AGT AAA ATT AAT CAA CCA TAT CAG AAA TTT TTA GAT GTA TTA AAT ACC 
Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr 
165 170 175 

ATT AAA AAT GCA TCT GAT TCA GAT GGA CAA GAT CTT TTA TTT ACT AAT 
He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 * 185 190 

CAG CTT AAG GAA CAT CCC ACA GAC TTT TCT GTA GAA TTC TTG GAA CAA 
Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 
195 200 205 

AAT AGC AAT GAG GTA CAA GAA GTA TTT GCG AAA GCT TTT GCA TAT TAT 
Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr 
210 215 220 

ATC GAG CCA CAG CAT CGT GAT GTT TTA CAG CTT TAT GCA CCG GAA GCT 
He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 

TTT AAT TAC ATG GAT AAA TTT AAC GAA CAA GAA ATA AAT CTA CTC GGC 
Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn Leu Leu Gly 
245 250 255 

GAC GGC GGC GAC GTC AGC TTC AGC ACC CGC GGC ACG CAG AAC TGG ACG 
Asp Gly Gly Asp Val Ser Phe Ser Thr Arg Gly Thr Gin Asn Trp Thr 
260 265 270 

GTG GAG CGG CTG CTC CAG GCG CAC CGC CAA CTG GAG GAG CGC GGC TAT 
Val Glu Arg Leu Leu Gin Ala His Arg Gin Leu Glu Glu Arg Gly Tyr 
275 280 285 

GTG TTC GTC GGC TAC CAC GGC ACC TTC CTC GAA GCG GCG CAA AGC ATC' 
Val Phe Val Gly Tyr His Gly Thr Phe Leu Glu Ala Ala Gin Ser He 
290 295 300 

GTC TTC GGC GGG GTG CGC GCG CGC AGC CAG GAC CTC GAC GCG ATC TGG 
Val Phe Gly Gly Val Arg Ala Arg Ser Gin Asp Leu Asp Ala He Trp 
305 310 315 320 

CGC GGT TTC TAT ATC GCC GGC GAT CCG GCG CTG GCC TAC GGC TAC GCC 
Arg Gly Phe Tyr He Ala Gly Asp Pro Ala Leu Ala Tyr Gly Tyr Ala 
325 330 335 

CAG GAC CAG GAA CCC GAC GCA CGC GGC CGG ATC CGC AAC GGT GCC CTG 
Gin Asp Gin Glu Pro Asp Ala Arg Gly Arg He Arg Asn Gly Ala Leu 
340 345 350 

CTG CGG GTC TAT GTG CCG CGC TCG AGC CTG CCG GGC TTC TAC CGC ACC 
Leu Arg Val Tyr Val Pro Arg Ser Ser Leu Pro Gly Phe Tyr Arg Thr 
355 360 365 

AGC CTG ACC CTG GCC GCG CCG GAG GCG GCG GGC GAG GTC GAA CGG CTG 
Ser Leu Thr Leu Ala Ala Pro Glu Ala Ala Gly Glu Val Glu Arg Leu 
370 375 380 

ATC GGC CAT CCG CTG CCG CTG CGC CTG GAC GCC ATC ACC GGC CCC GAG 
He Gly His Pro Leu Pro Leu Arg Leu Asp Ala He Thr Gly Pro Glu 
385 390 395 400 

GAG GAA GGC GGG CGC CTG GAG ACC ATT CTC GGC TGG CCG CTG GCC GAG 
Glu Glu Gly Gly Arg Leu Glu Thr He Leu Gly Trp Pro Leu Ala Glu 
405 410 415 

CGC ACC GTG GTG ATT CCC TCG GCG ATC CCC ACC GAC CCG CGC AAC GTC 
Arg Thr Val Val He Pro Ser Ala He Pro Thr Asp Pro Arg Asn Val 
420 425 430 

GGC GGC GAC CTC GAC CCG TCC AGC ATC CCC GAC AAG GAA CAG GCG ATC 
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AGC GCC CTG CCG GAC TAC GCC AGC 
Ser Ala Leu Pro Asp Tyr Ala Ser 
450 455 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 456 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
15 10 15 

Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 

20 25 30 

Glu Glu His Leu Lys Glu He Met Lys His He Val Lys He Glu Val 
35 40 45 

Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
50 55 60 

Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys 
65 70 75 80 

He Tyr He Val Asp Gly Asp He Thr Lys His He Ser Leu Glu Ala 
85 90 95 
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Val Glu Arg Leu Leu Gin Ala His Arg Gin Leu Glu Glu Arg Gly Tyr 
275 280 285 

Val Phe Val Gly Tyr His Gly Thr Phe Leu Glu Ala Ala Gin Ser lie 
290 " " 295 300 

Val Phe Gly Gly Val Arg Ala Arg Ser Gin Asp Leu Asp Ala He Trp 
305 310 315 320 

Arg Gly Phe Tyr He Ala Gly Asp Pro Ala Leu Ala Tyr Gly Tyr Ala 
325 330 335 

Gin Asp Gin Glu Pro Asp Ala Arg Gly Arg He Arg Asn Gly Ala Leu 
340 345 350 

Leu Arg Val Tyr Val Pro Arg Ser Ser Leu Pro Gly Phe Tyr Arg Thr 
355 360 365 

Ser Leu Thr Leu Ala Ala Pro Glu Ala Ala Gly Glu Val Glu Arg Leu 
370 375 380 

He Gly His Pro Leu Pro Leu Arg Leu Asp Ala He Thr Gly Pro Glu 
385 390 395 400 

Glu Glu Gly Gly Arg Leu Glu Thr He Leu Gly Trp Pro Leu Ala Glu 
405 410 415 

Arg Thr Val Val He Pro Ser Ala He Pro Thr Asp Pro Arg Asn Val 
420 425 430 

Gly Gly Asp Leu Asp Pro Ser Ser He Pro Asp Lys Glu Gin Ala He 
435 440 445 



(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1425 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



( ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..1416 

(D) OTHER INFORMATION: /product= 

"LF (1-254) --TR--PE (398-613) " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATG GTA CCA GCG GGC GGT CAT GGT GAT GTA GGT ATG CAC GTA AAA GAG 
Met Val Pro Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu 
15 10 15 

AAA GAG AAA AAT AAA GAT GAG AAT AAG AGA AAA GAT GAA GAA CGA AAT 
Lys Glu Lys Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn 
20 25 30 

AAA ACA CAG GAA GAG CAT TTA AAG GAA ATC ATG AAA CAC ATT GTA AAA 
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88 

Lys Thr Gin Glu Glu His Leu Lys Glu lie Met Lys His He Val Lys 
35 40 45 

ATA GAA GTA AAA GGG GAG GAA GCT GTT AAA AAA GAG GCA GCA GAA AAG 192 
5 He Glu Val Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys 
50 55 60 

CTA CTT GAG AAA GTA CCA TCT GAT GTT TTA GAG ATG TAT AAA GCA ATT 240 
Leu Leu Glu Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He 
10 65 70 75 80 



15 



GGA GGA AAG ATA TAT ATT GTG GAT GGT GAT ATT ACA AAA CAT ATA TCT 
Gly Gly Lys He Tyr He Val Asp Gly Asp He Thr Lys His He Ser 
85 90 95 
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TTA GAA GCA TTA TCT GAA GAT AAG AAA AAA ATA AAA GAC ATT TAT GGG 
Leu Glu Ala Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly 
100 105 110 

AAA GAT GCT TTA TTA CAT GAA CAT TAT GTA TAT GCA AAA GAA GGA TAT 
Lys Asp Ala Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr 
115 120 125 

GAA CCC GTA CTT GTA ATC CAA TCT TCG GAA GAT TAT GTA GAA AAT ACT 
Glu Pro Val Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr 
130 135 140 

GAA AAG GCA CTG AAC GTT TAT TAT GAA ATA GGT AAG ATA TTA TCA AGG 
Glu Lys Ala Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg 
145 150 155 160 

GAT ATT TTA AGT AAA ATT AAT CAA CCA TAT CAG AAA TTT TTA GAT GTA 
Asp He Leu Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val 
165 170 175 

TTA AAT ACC ATT AAA AAT GCA TCT GAT TCA GAT GGA CAA GAT CTT TTA 
Leu Asn Thr He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu 
180 185 190 

TTT ACT AAT CAG CTT AAG GAA CAT CCC ACA GAC TTT TCT GTA GAA TTC 
Phe Thr Asn Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe 
195 200 205 

TTG GAA CAA AAT AGC AAT GAG GTA CAA GAA GTA TTT GCG AAA GCT TTT 
Leu Glu Gin Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe 
210 215 220 

GCA TAT TAT ATC GAG CCA CAG CAT CGT GAT GTT TTA CAG CTT TAT GCA 
Ala Tyr Tyr He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala 
225 230 235 240 

CCG GAA GCT TTT AAT TAC ATG GAT AAA TTT AAC GAA CAA GAA ATA AAT 
Pro Glu Ala Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn 
245 250 255 

CTA ACG CGT GCG GAG TTC CTC GGC GAC GGC GGC GAC GTC AGC TTC AGC 
Leu Thr Arg Ala Glu Phe Leu Gly Asp Gly Gly Asp Val Ser Phe Ser 
2G0 265 270 

ACC CGC GGC ACG CAG AAC TGG ACG GTG GAG CGG CTG CTC CAG GCG CAC 
Thr Arg Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala His 
275 280 285 

CGC CAA CTG GAG GAG CGC GGC TAT GTG TTC GTC GGC TAC CAC GGC ACC 
Arg Gin Leu Glu Glu Arg Gly Tyr Val Phe Val Gly Tyr His Gly Thr 
290 295 300 

TTC CTC GAA GCG GCG CAA AGC ATC GTC TTC GGC GGG GTG CGC GCG CGC 
Phe Leu Glu Ala Ala Gin Ser He Val Phe Gly Gly Val Arg Ala Arg 
305 310 315 320 

AGC CAG GAC CTC GAC GCG ATC TGG CGC GGT TTC TAT ATC GCC GGC GAT 
Ser Gin Asp Leu Asp Ala He Trp Arg Gly Phe Tyr He Ala Gly Asp 
325 330 335 

CCG GCG CTG GCC TAC GGC TAC GCC CAG GAC CAG GAA CCC GAC GCA CGC 
Pro Ala Leu Ala Tyr Gly Tyr Ala Gin Asp Gin Glu Pro Asp Ala Arg 
340 345 350 

GGC CGG ATC CGC AAC GGT GCC CTG CTG CGG GTC TAT GTG CCG CGC TCG 
Gly Arg He Arg Asn Gly Ala Leu Leu Arg Val Tyr Val Pro Arg Ser 
355 360 365 

AGC CTG CCG GGC TTC TAC CGC ACC AGC CTG ACC CTG GCC GCG CCG GAG 
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Ser Leu Pro Gly Phe Tyr Arg Thr Ser Leu Thr Leu Ala Ala Pro Glu 
370 375 380 

GCG GCG GGC GAG GTC GAA CGG CTG ATC GGC CAT CCG CTG CCG CTG CGC 
Ala Ala Gly Glu Val Glu Arg Leu He Gly His Pro Leu Pro Leu Arg 
385 390 395 400 

CTG GAC GCC ATC ACC GGC CCC GAG GAG GAA GGC GGG CGC CTG GAG ACC 
Leu Asp Ala He Thr Gly Pro Glu Glu Glu Gly Gly Arg Leu Glu Thr 
405 410 415 

ATT CTC GGC TGG CCG CTG GCC GAG CGC ACC GTG GTG ATT CCC TCG GCG 
He Leu Gly Trp Pro Leu Ala Glu Arg Thr Val Val He Pro Ser Ala 
420 425 430 

ATC CCC ACC GAC CCG CGC AAC GTC GGC GGC GAC CTC GAC CCG TCC AGC 
He Pro Thr Asp Pro Arg Asn Val Gly Gly Asp Leu Asp Pro Ser Ser 
435 440 445 

ATC CCC GAC AAG GAA CAG GCG ATC AGC GCC CTG CCG GAC TAC GCC AGC 
He Pro Asp Lys Glu Gin Ala He Ser Ala Leu Pro Asp Tyr Ala Ser 
450 455 460 

CAG CCC GGC AAA CCG CCG CGC GAG GACCTGAAG 
Gin Pro Gly Lys Pro Pro Arg Glu 
465 470 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 472 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Val Pro Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu 
15 10 15 
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Lys Glu Lys Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn 
20 25 30 

Lys Thr Gin Glu Glu His Leu Lys Glu He Met Lys His He Val Lys 
35 40 45 

He Glu Val Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys 
50 55 60 

Leu Leu Glu Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He 
65 70 75 80 

Gly Gly Lys He Tyr He Val Asp Gly Asp He Thr Lys His He Ser 
85 90 95 

Leu Glu Ala Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly 
100 105 110 

Lys Asp Ala Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr 
115 120 125 

Glu Pro Val Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr 
130 135 140 

Glu Lys Ala Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg 
145 150 155 160 

Asp He Leu Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val 
165 170 175 

Leu Asn Thr He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu 
180 185 190 

Phe Thr Asn Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe 
195 200 205 

Leu Glu Gin Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe 

210 215 220 

Ala Tyr Tyr He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala 
225 230 235 240 

Pro Glu Ala Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn 
245 250 255 

Leu Thr Arg Ala Glu Phe Leu Gly Asp Gly Gly Asp Val Ser Phe Ser 
260 265 270 

Thr Arg Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala His 
275 280 285 

Arg Gin Leu Glu Glu Arg Gly Tyr Val Phe Val Gly Tyr His Gly Thr 
290 295 300 

Phe Leu Glu Ala Ala Gin Ser He Val Phe Gly Gly Val Arg Ala Arg 
305 310 315 320 

Ser Gin Asp Leu Asp Ala He Trp Arg Gly Phe Tyr He Ala Gly Asp 
325 330 335 

Pro Ala Leu Ala Tyr Gly Tyr Ala Gin Asp Gin Glu Pro Asp Ala Arg 
340 345 350 

Gly Arg He Arg Asn Gly Ala Leu Leu Arg Val Tyr Val Pro Arg Ser 
355 360 365 

Ser Leu Pro Gly Phe Tyr Arg Thr Ser Leu Thr Leu Ala Ala Pro Glu 
370 375 380 

Ala Ala Gly Glu Val Glu Arg Leu He Gly His Pro Leu Pro Leu Arg 
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Leu Asp Ala lie Thr Gly Pro Glu Glu Glu Gly Gly Arg Leu Glu Thr 
405 410 415 

lie Leu Gly Trp Pro Leu Ala Glu Arg Thr Val Val lie Pro Ser Ala 
420 425 430 

He Pro Thr Asp Pro Arg Asn Val Gly Gly Asp Leu Asp Pro Ser Ser 
435 440 445 

He Pro Asp Lys Glu Gin Ala He Ser Ala Leu Pro Asp Tyr Ala Ser 
450 455 460 

Gin Pro Gly Lys Pro Pro Arg Glu 
465 470 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1524 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1..1524 

(D) OTHER INFORMATION: /product= 

"LF (1-254) - -TR- -PE (362-613) " 



40 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
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GCG GGC GGT CAT GGT GAT GTA GGT ATG CAC GTA AAA GAG AAA GAG AAA 
Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
15 10 15 

AAT AAA GAT GAG AAT AAG AGA AAA GAT GAA GAA CGA AAT AAA ACA CAG 
Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

GAA GAG CAT TTA AAG GAA ATC ATG AAA CAC ATT GTA AAA ATA GAA GTA 
Glu Glu His Leu Lys Glu lie Met Lys His lie Val Lys He Glu Val 
35 40 45 

AAA GGG GAG GAA GCT GTT AAA AAA GAG GCA GCA GAA AAG CTA CTT GAG 
Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
50 55 60 

AAA GTA CCA TCT GAT GTT TTA GAG ATG TAT AAA GCA ATT GGA GGA AAG 
Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys 
65 70 75 80 

ATA TAT ATT GTG GAT GGT GAT ATT ACA AAA CAT ATA TCT TTA GAA GCA 
He Tyr He Val Asp Gly Asp He Thr Lys His He Ser Leu Glu Ala 
85 90 95 

TTA TCT GAA GAT AAG AAA AAA ATA AAA GAC ATT TAT GGG AAA GAT GCT 
Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala 
100 105 110 

TTA TTA CAT GAA CAT TAT GTA TAT GCA AAA GAA GGA TAT GAA CCC GTA 
Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
115 * 120 125 

CTT GTA ATC CAA TCT TCG GAA GAT TAT GTA GAA AAT ACT GAA AAG GCA 
Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
130 135 140 

CTG AAC GTT TAT TAT GAA ATA GGT AAG ATA TTA TCA AGG GAT ATT TTA 
Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg Asp He Leu 
145 150 155 160 

AGT AAA ATT AAT CAA CCA TAT CAG AAA TTT TTA GAT GTA TTA AAT ACC 
Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr 
165 170 175 

ATT AAA AAT GCA TCT GAT TCA GAT GGA CAA GAT CTT TTA TTT ACT AAT 
He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 185 190 

CAG CTT AAG GAA CAT CCC ACA GAC TTT TCT GTA GAA TTC TTG GAA CAA 
Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 
195 200 205 

AAT AGC AAT GAG GTA CAA GAA GTA TTT GCG AAA GCT TTT GCA TAT TAT 
Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr 
210 215 220 

ATC GAG CCA CAG CAT CGT GAT GTT TTA CAG CTT TAT GCA CCG GAA GCT 
He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 

TTT AAT TAC ATG GAT AAA TTT AAC GAA CAA GAA ATA AAT CTA ACG CGT 
Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn Leu Thr Arg 
245 250 255 

GCG GCC AAC GCC GAC GTG GTG AGC CTG ACC TGC CCG GTC GCC GCC GGT 
Ala Ala Asn Ala Asp Val Val Ser Leu Thr Cys Pro Val Ala Ala Gly 
260 265 270 

GAA TGC GCG GGC CCG GCG GAC AGC GGC GAC GCC CTG CTG GAG CGC AAC 
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TAT CCC ACT GGC GCG GAG TTC CTC GGC GAC GGC GGC GAC GTC AGC TTC 
Tyr Pro Thr Gly Ala Glu Phe Leu Gly Asp Gly Gly Asp Val Ser Phe 
290 295 300 

AGC ACC CGC GGC ACG CAG AAC TGG ACG GTG GAG CGG CTG CTC CAG GCG 
Ser Thr Arg Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala 
305 310 315 320 

CAC CGC CAA CTG GAG GAG CGC GGC TAT GTG TTC GTC GGC TAC CAC GGC 
His Arg Gin Leu Glu Glu Arg Gly Tyr Val Phe Val Gly Tyr His Gly 
325 330 335 

ACC TTC CTC GAA GCG GCG CAA AGC ATC GTC TTC GGC GGG GTG CGC GCG 
Thr Phe Leu Glu Ala Ala Gin Ser He Val Phe Gly Gly Val Arg Ala 
340 345 350 

CGC AGC CAG GAC CTC GAC GCG ATC TGG CGC GGT TTC TAT ATC GCC GGC 
Arg Ser Gin Asp Leu Asp Ala He Trp Arg Gly Phe Tyr He Ala Gly 
355 360 365 

GAT CCG GCG CTG GCC TAC GGC TAC GCC CAG GAC CAG GAA CCC GAC GCA 
Asp Pro Ala Leu Ala Tyr Gly Tyr Ala Gin Asp Gin Glu Pro Asp Ala 
370 375 380 

CGC GGC CGG ATC CGC AAC GGT GCC CTG CTG CGG GTC TAT GTG CCG CGC 
Arg Gly Arg He Arg Asn Gly Ala Leu Leu Arg Val Tyr Val Pro Arg 
385 390 395 400 

TCG AGC CTG CCG GGC TTC TAC CGC ACC AGC CTG ACC CTG GCC GCG CCG 
Ser Ser Leu Pro Gly Phe Tyr Arg Thr Ser Leu Thr Leu Ala Ala Pro 
405 410 415 

GAG GCG GCG GGC GAG GTC GAA CGG CTG ATC GGC CAT CCG CTG CCG CTG 
Glu Ala Ala Gly Glu Val Glu Arg Leu He Gly His Pro Leu Pro Leu 
420 ' 425 430 

CGC CTG GAC GCC ATC ACC GGC CCC GAG GAG GAA GGC GGG CGC CTG GAG 
Arg Leu Asp Ala He Thr Gly Pro Glu Glu Glu Gly Gly Arg Leu Glu 
435 440 445 

ACC ATT CTC GGC TGG CCG CTG GCC GAG CGC ACC GTG GTG ATT CCC TCG 
Thr He Leu Gly Trp Pro Leu Ala Glu Arg Thr Val Val He Pro Ser 
450 455 460 

GCG ATC CCC ACC GAC CCG CGC AAC GTC GGC GGC GAC CTC GAC CCG TCC 
Ala He Pro Thr Asp Pro Arg Asn Val Gly Gly Asp Leu Asp Pro Ser 
465 470 475 480 

AGC ATC CCC GAC AAG GAA CAG GCG ATC AGC GCC CTG CCG GAC TAC GCC 
Ser He Pro Asp Lys Glu Gin Ala He Ser Ala Leu Pro Asp Tyr Ala 
485 490 495 

AGC CAG CCC GGC AAA CCG CCG CGC GAG GAC CTG AAG 
Ser Gin Pro Gly Lys Pro Pro Arg Glu Asp Leu Lys 
500 505 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 508 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



WO 94/18332 



PCT/US94/01624 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
15 10 15 

Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

Glu Glu His Leu Lys Glu lie Met Lys His lie Val Lys He Glu Val 
35 40 45 

Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
50 55 60 

Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys 
65 70 75 80 

He Tyr He Val Asp Gly Asp He Thr Lys His He Ser Leu Glu Ala 
85 90 95 

Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala 
100 105 110 

Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
115 120 125 

Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
130 135 140 
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Ser Lys lie Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr 
165 170 175 

He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 185 190 

Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 
195 200 205 



Ala Ala Asn Ala Asp Val Val Ser Leu Thr Cys Pro Val Ala Ala Gly 
260 265 270 



Ser Thr Arg Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala 
305 310 315 320 



Thr Phe Leu Glu Ala Ala Gin Ser He Val Phe Gly Gly Val Arg Ala 
340 345 350 



Arg Gly Arg He Arg Asn Gly Ala Leu Leu Arg Val Tyr Val Pro Arg 
385 390 395 400 

Ser Ser Leu Pro Gly Phe Tyr Arg Thr Ser Leu Thr Leu Ala Ala Pro 
405 410 415 



Arg Leu Asp Ala He Thr Gly Pro Glu Glu Glu Gly Gly Arg Leu Glu 

435 440 445 

Thr He Leu Gly Trp Pro Leu Ala Glu Arg Thr Val Val He Pro Ser 
450 455 460 



(2) INFORMATION FOR SEQ ID NO: 11: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2709 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

( ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..2709 

( D ) OTHER INFORMATION : /product = " PA (1 - 725 ) Human C 

residues (1-178) " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GAA GTT AAA CAG GAG AAC CGG TTA TTA AAT GAA TCA GAA TCA AGT TCC 
Glu Val Lys Gin Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser 
1 5 10 15 

CAG GGG TTA CTA GGA TAC TAT TTT AGT GAT TTG AAT TTT CAA GCA CCC 
Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 
20 25 30 

ATG GTG GTT ACC TCT TCT ACT ACA GGG GAT TTA TCT ATT CCT AGT TCT 
Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser He Pro Ser Ser 
35 40 45 

GAG TTA GAA AAT ATT CCA TCG GAA AAC CAA TAT TTT CAA TCT GCT ATT 
Glu Leu Glu Asn He Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala He 
50 55 60 

TGG TCA GGA TTT ATC AAA GTT AAG AAG AGT GAT GAA TAT ACA TTT GCT 
Trp Ser Gly Phe He Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
65 70 75 80 

ACT TCC GCT GAT AAT CAT GTA ACA ATG TGG GTA GAT GAC CAA GAA GTG 
Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 
85 90 95 

ATT AAT AAA GCT TCT AAT TCT AAC AAA ATC AGA TTA GAA AAA GGA AGA 
He Asn Lys Ala Ser Asn Ser Asn Lys He Arg Leu Glu Lys Gly Arg 
100 105 110 

TTA TAT CAA ATA AAA ATT CAA TAT CAA CGA GAA AAT CCT ACT GAA AAA 
Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
115 120 125 

GGA TTG GAT TTC AAG TTG TAC TGG ACC GAT TCT CAA AAT AAA AAA GAA 
Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 
130 135 140 

GTG ATT TCT AGT GAT AAC TTA CAA TTG CCA GAA TTA AAA CAA AAA TCT 
Val He Ser Ser Asp Asn Leu Gin Leu Pro Glu Leu Lys Gin Lys Ser 
145 150 155 160 

TCG AAC TCA AGA AAA AAG CGA AGT ACA AGT GCT GGA CCT ACG GTT CCA 
Ser Asn Ser Arg Lys Lys Arg Ser Thr Ser Ala Gly Pro Thr Val Pro 
165 170 175 

GAC CGT GAC AAT GAT GGA ATC CCT GAT TCA TTA GAG GTA GAA GGA TAT 
Asp Arg Asp Asn Asp Gly He Pro Asp Ser Leu Glu Val Glu Gly Tyr 
180 185 190 
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ACG GTT GAT GTC AAA AAT AAA AGA ACT TTT CTT TCA CCA TGG ATT TCT 
Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp lie Ser 
195 200 205 

AAT ATT CAT GAA AAG AAA GGA TTA ACC AAA TAT AAA TCA TCT CCT GAA 
Asn lie His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro Glu 
210 215 220 

AAA TGG AGC ACG GCT TCT GAT CCG TAC AGT GAT TTC GAA AAG GTT ACA 
Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val Thr 
225 230 235 240 

GGA CGG ATT GAT AAG AAT GTA TCA CCA GAG GCA AGA CAC CCC CTT GTG 
Gly Arg lie Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu Val 
245 250 255 

GCA GCT TAT CCG ATT GTA CAT GTA GAT ATG GAG AAT ATT ATT CTC TCA 
Ala Ala Tyr Pro lie Val His Val Asp Met Glu Asn lie lie Leu Ser 
260 265 270 

AAA AAT GAG GAT CAA TCC ACA CAG AAT ACT GAT AGT GAA ACG AGA ACA 
Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser Glu Thr Arg Thr 
275 280 285 

ATA AGT AAA AAT ACT TCT ACA AGT AGG ACA CAT ACT AGT GAA GTA CAT 
lie Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val His 
290 295 300 

GGA AAT GCA GAA GTG CAT GCG TCG TTC TTT GAT ATT GGT GGG AGT GTA 
Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He Gly Gly Ser Val 
305 310 315 320 

TCT GCA GGA TTT AGT AAT TCG AAT TCA AGT ACG GTC GCA ATT GAT CAT 
Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val Ala He Asp His 
325 330 335 

TCA CTA TCT CTA GCA GGG GAA AGA ACT TGG GCT GAA ACA ATG GGT TTA 
Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly Leu 
340 345 350 

AAT ACC GCT GAT ACA GCA AGA TTA AAT GCC AAT ATT AGA TAT GTA AAT 
Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn He Arg Tyr Val Asn 
355 360 365 

ACT GGG ACG GCT CCA ATC TAC AAC GTG TTA CCA ACG ACT TCG TTA GTG 
Thr Gly Thr Ala Pro He Tyr Asn Val Leu Pro Thr Thr Ser Leu Val 
370 375 380 

TTA GGA AAA AAT CAA ACA CTC GCG ACA ATT AAA GCT AAG GAA AAC CAA 
Leu Gly Lys Asn Gin Thr Leu Ala Thr He Lys Ala Lys Glu Asn Gin 
385 ' 390 395 400 

TTA AGT CAA ATA CTT GCA CCT AAT AAT TAT TAT CCT TCT AAA AAC TTG 
Leu Ser Gin He Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn Leu 
405 410 415 

GCG CCA ATC GCA TTA AAT GCA CAA GAC GAT TTC AGT TCT ACT CCA ATT 
Ala Pro He Ala Leu Asn Ala Gin Asp Asp Phe Ser Ser Thr Pro He 
420 425 430 

ACA ATG AAT TAC AAT CAA TTT CTT GAG TTA GAA AAA ACG AAA CAA TTA 
Thr Met Asn Tyr Asn Gin Phe Leu Glu Leu Glu Lys Thr Lys Gin Leu 
435 440 445 

AGA TTA GAT ACG GAT CAA GTA TAT GGG AAT ATA GCA ACA TAC AAT TTT 
Arg Leu Asp Thr Asp Gin Val Tyr Gly Asn He Ala Thr Tyr Asn Phe 
450 455 460 

GAA AAT GGA AGA GTG AGG GTG GAT ACA GGC TCG AAC TGG AGT GAA GTG 
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Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
465 470 475 480 

TTA CCG CAA ATT CAA GAA ACA ACT GCA CGT ATC ATT TTT ART GGA AAA 1488 
5 Leu Pro Gin lie Gin Glu Thr Thr Ala Arg lie lie Phe Asn Gly Lys 
485 490 495 

GAT TTA AAT CTG GTA GAA AGG CGG ATA GCG GCG GTT ART CCT AGT GAT 1536 
Asp Leu Asn Leu Val Glu Arg Arg lie Ala Ala Val Asn Pro Ser Asp 
10 500 505 510 

CCA TTA GAA ACG ACT AAA CCG GAT ATG ACA TTA AAA GAA GCC CTT AAA 1584 
Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
515 520 525 

15 

ATA GCA TTT GGA TTT AAC GAA CCG AAT GGA AAC TTA CAA TAT CAA GGG 1632 
lie Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
530 535 540 

20 AAA GAC ATA ACC GAA TTT GAT TTT AAT TTC GAT CAA CAA ACA TCT CAA 1680 

Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
545 550 555 560 

AAT ATC AAG AAT CAG TTA GCG GAA TTA AAC GCA ACT AAC ATA TAT ACT 1728 
25 Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 
565 570 575 

GTA TTA GAT AAA ATC AAA TTA AAT GCA AAA ATG AAT ATT TTA ATA AGA 1776 
Val Leu Asp Lys He Lys Leu Asn Ala Lys Met Asn He Leu He Arg 
30 580 585 590 

GAT AAA CGT TTT CAT TAT GAT AGA AAT AAC ATA GCA GTT GGG GCG GAT 1824 
Asp Lys Arg Phe His Tyr Asp Arg Asn Asn He Ala Val Gly Ala Asp 
595 600 605 

35 

GAG TCA GTA GTT AAG GAG GCT CAT AGA GAA GTA ATT AAT TCG TCA ACA 1872 
Glu Ser Val Val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 
610 615 620 

40 GAG GGA TTA TTG TTA AAT ATT GAT AAG GAT ATA AGA AAA ATA TTA TCA 1920 

Glu Gly Leu Leu Leu Asn He Asp Lys Asp He Arg Lys He Leu Ser 
625 630 635 640 

GGT TAT ATT GTA GAA ATT GAA GAT ACT GAA GGG CTT AAA GAA GTT ATA 19 68 

45 Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 
645 650 655 

AAT GAC AGA TAT GAT ATG TTG AAT ATT TCT AGT TTA CGG CAA GAT GGA 2016 
Asn Asp Arg Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
50 660 665 670 

AAA ACA TTT ATA GAT TTT AAA AAA TAT AAT GAT AAA TTA CCG TTA TAT 2064 
Lys Thr Phe He Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr 
675 680 685 

55 

ATA AGT AAT CCC AAT TAT AAG GTA AAT GTA TAT GCT GTT ACT AAA GAA 2112 
He Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 
690 695 700 

60 AAC ACT ATT ATT AAT CCT AGT GAG AAT GGG GAT ACT AGT ACC AAC GGG 2160 

Asn Thr He He Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
705 710 715 720 

ATC AAG AAA ATT TTA AAG AAA GTG GTG CTG GGC AAA AAA GGG GAT ACA 2208 
65 He Lys Lys He Leu Lys Lys Val Val Leu Gly Lys Lys Gly Asp Thr 
725 730 735 

GTG GAA CTG ACC TGT ACA GCT TCC CAG AAG AAG AGC ATA CAA TTC CAC 2256 
Val Glu Leu Thr Cys Thr Ala Ser Gin Lys Lys Ser He Gin Phe His 
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TGG AAA AAC TCC AAC CAG ATA AAG ATT CTG GGA AAT CAG GGC TCC TTC 
Trp Lys Asn Ser Asn Gin lie Lys lie Leu Gly Asn Gin Gly Ser Phe 
755 760 765 

TTA ACT AAA GGT CCA TCC AAG CTG AAT GAT CGC GCT GAC TCA AGA AGA 
Leu Thr Lys Gly Pro Ser Lys Leu Asn Asp Arg Ala Asp Ser Arg Arg 
770 ' 775 780 

AGC CTT TGG GAC CAA GGA AAC TTC CCC CTG ATC ATC AAG AAT CTT AAG 
Ser Leu Trp Asp Gin Gly Asn Phe Pro Leu lie lie Lys Asn Leu Lys 
785 790 795 800 

ATA GAA GAC TCA GAT ACT TAC ATC TGT GAA GTG GAG GAC CAG AAG GAG 
He Glu Asp Ser Asp Thr Tyr He Cys Glu Val Glu Asp Gin Lys Glu 
805 810 815 

GAG GTG CAA TTG CTA GTG TTC GGA TTG ACT GCC AAC TCT GAC ACC CAC 
Glu Val Gin Leu Leu Val Phe Gly Leu Thr Ala Asn Ser Asp Thr His 
820 825 830 

CTG CTT CAG GGG CAG AGC CTG ACC CTG ACC TTG GAG AGC CCC CCT GGT 
Leu Leu Gin Gly Gin Ser Leu Thr Leu Thr Leu Glu Ser Pro Pro Gly 
835 840 845 

AGT AGC CCC TCA GTG CAA TGT AGG AGT CCA AGG GGT AAA AAC ATA CAG 
Ser Ser Pro Ser Val Gin Cys Arg Ser Pro Arg Gly Lys Asn He Gin 
850 855 860 

GGG GGG AAG ACC CTC TCC GTG TCT CAG CTG GAG CTC CAG GAT AGT GGC 
Gly Gly Lys Thr Leu Ser Val Ser Gin Leu Glu Leu Gin Asp Ser Gly 
865 870 875 880 

ACC TGG ACA TGC ACT GTC TTG CAG AAC CAG AAG AAG GTG GAG TTC AAA 
Thr Trp Thr Cys Thr Val Leu Gin Asn Gin Lys Lys Val Glu Phe Lys 
885 890 895 

ATA GAC ATC GTG GTG CTA GCT 
He Asp He Val Val Leu Ala 
900 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 903 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Glu Val Lys Gin Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser 
15 10 15 
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Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 
20 25 30 

Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser He Pro Ser Ser 
35 40 45 

Glu Leu Glu Asn He Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala He 
50 55 60 

Trp Ser Gly Phe He Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
G5 70 75 80 

Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 
85 90 95 

He Asn Lys Ala Ser Asn Ser Asn Lys He Arg Leu Glu Lys Gly Arg 
100 105 HO 

Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
115 120 125 

Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 
130 135 140 

Val He Ser Ser Asp Asn Leu Gin Leu Pro Glu Leu Lys Gin Lys Ser 
145 150 155 160 

Ser Asn Ser Arg Lys Lys Arg Ser Thr Ser Ala Gly Pro Thr Val Pro 
165 170 175 



Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp He Ser 
195 200 205 

Asn He His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro Glu 
210 215 220 

Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val Thr 
225 230 235 240 

Gly Arg He Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu Val 
245 250 255 



Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser Glu Thr Arg Thr 
275 280 285 

He Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val His 
290 295 300 

Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He Gly Gly Ser Val 

305 310 315 320 

Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val Ala He Asp His 

325 330 335 



Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn He Arg Tyr Val Asn 
355 360 365 

Thr Gly Thr Ala Pro He Tyr Asn Val Leu Pro Thr Thr Ser Leu Val 
370 375 380 

Leu Gly Lys Asn Gin Thr Leu Ala Thr He Lys Ala Lys Glu Asn Gin 
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395 



Ala Pro lie Ala Leu Asn Ala Gin Asp Asp Phe Ser Ser Thr Pro lie 
420 425 430 

Thr Met Asn Tyr Asn Gin Phe Leu Glu Leu Glu Lys Thr Lys Gin Leu 
435 440 445 



Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
4G5 470 475 480 

Leu Pro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 
485 490 495 

Asp Leu Asn Leu Val Glu Arg Arg He Ala Ala Val Asn Pro Ser Asp 
500 505 510 



He Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
530 535 540 

Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
545 550 555 560 

Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 
565 570 575 

Val Leu Asp Lys He Lys Leu Asn Ala Lys Met Asn He Leu He Arg 
580 585 590 



Glu Ser Val Val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 

610 ' 615 620 

Glu Gly Leu Leu Leu Asn He Asp Lys Asp He Arg Lys He Leu Ser 
625 630 635 640 

Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 
645 650 655 

Asn Asp Arg Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
660 665 670 



He Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 
690 695 700 

Asn Thr He He Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
705 710 715 720 
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lie Glu Asp Ser Asp Thr Tyr lie Cys Glu Val Glu Asp Gin Lys Glu 
805 810 815 



Gly Gly Lys Thr Leu Ser Val Ser Gin Leu Glu Leu Gin Asp Ser Gly 
865 870 875 880 

Thr Trp Thr Cys Thr Val Leu Gin Asn Gin Lys Lys Val Glu Phe Lys 
885 890 895 



He Asp He Val Val Leu Ala 
900 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /label= PAHIV 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Ser Gin Asn Tyr Pro Val Val Gin 
1 5 

(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 12 

(D) OTHER INFORMATION: /label= PAHIV-1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Gin Val Ser Gin Asn Tyr Pro He Val Gin Asn He 
15 10 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 
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(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..12 

(D) OTHER INFORMATION: /label= PAHIV-2 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
Asn Thr Ala Thr lie Met Met Gin Arg Gly Asn Phe 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

.(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..12 

(D) OTHER INFORMATION: /label= PAHIV-3 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 

Thr Val Ser Phe Asn Phe Pro Gin lie Thr Leu Trp 
15 10 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 
(v 



(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1..13 

(D) OTHER INFORMATION: /label= PAHIV-4 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
Gly Gly Ser Ala Phe Asn Phe Pro lie Val Met Gly Gly 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

( ix) FEATURE : 

(A) NAME /KEY: CDS 

(B) LOCATION: 3 . .44 

(D) OTHER INFORMATION: /product= "Primer 1A" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

CG CAA GTA TCA CAA AAT TAT CCG ATC GTG CAA AAC ATA CTG CAG 
Gin Val Ser Gin Asn Tyr Pro lie Val Gin Asn He Leu Gin 



(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Gin Val Ser Gin Asn Tyr Pro He Val Gin Asn He Leu Gin 
15 10 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid n 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: YES 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

( ix) FEATURE : 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1. .46 

(D) OTHER INFORMATION: /product= "PRIMER IB" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GTTCCTGCAG TATGTTTTGC ACGATCGGAT AATTTTGTGA TACTTG 
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(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



( ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 3.. 44 
(D) OTHER INFORMATION: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 



3 AAC ACT GCC ACT ATC ATG ATG CAA CGT GGT AAT TTT CTG CAG 
Asn Thr Ala Thr He Met Met Gin Arg Gly Asn Phe Leu Gin 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Asn Thr Ala Thr He Met Met Gin Arg Gly Asn Phe Leu Gin 
15 10 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 4 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

ii) MOLECULE TYPE: DNA (genomic) 

ii) HYPOTHETICAL: NO 

iv) ANTI- SENSE: YES 

;vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis ; 

ix) FEATURE : 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1. .46 

(D) OTHER INFORMATION: /product= "PRIMER 2B" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



WO 94/18332 



PCT/US94/01624 



108 

GTCCCTGCAG AAAATTACCA CGTTGCATCA TGATAGTGGC AGTGTT 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL : NO 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 3.-44 

(D) OTHER INFORMATION: /product= "Primer 3A" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

CG ACT GTC TCT TTT AAC TTC CCG CAA ATC ACG CTT TGG CTG CAG 
Thr Val Ser Phe Asn Phe Pro Gin lie Thr Leu Trp Leu Gin 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Thr Val Ser Phe Asn Phe Pro Gin lie Thr Leu Trp Leu Gin 
15 10 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: YES 



(ix) FEATURE: 

. ( A) NAME /KEY : mi s c_f eature 
(B) LOCATION: 1..46 

(D) OTHER INFORMATION: /product= "PRIMER 3B" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
GTCCCTGCAG CCAAAGCGTG ATTTGCGGGA AGTTAAAAGA GACAGT 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 3.. 47 

(D) OTHER INFORMATION: /product= "Primer 4A" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

CG GGC GGT TCT GCC TTT AAC TTC CCG ATC GTC ATG GGA GGT CTG CAG 
Gly Gly Ser Ala Phe Asn Phe Pro He Val Met Gly Gly Leu Gin 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



WO 94/18332 



PCT/US94/01624 



110 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

Gly Gly Ser Ala Phe Asn Phe Pro He Val Met Gly Gly Leu Gin 
15 10 15 

(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: YES 



(ix) FEATURE : 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1..49 

(D) OTHER INFORMATION : /product= " PRIMER 4B" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
GTCCCTGCAG ACCTCCCATG ACGATCGGGA AGTTAAAGGC AGAACCGCC 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(ix) 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..2157 

(D) OTHER INFORMATION: /product= "PAHIV#2" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 

GAA GTT AAA CAG GAG AAC CGG TTA TTA AAT GAA TCA GAA TCA AGT TCC 4 8 

Glu Val Lys Gin Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser 
5 1 5 10 15 

CAG GGG TTA CTA GGA TAC TAT TTT AGT GAT TTG AAT TTT CAA GCA CCC 9 6 

Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 
20 25 30 

10 

ATG GTG GTT ACC TCT TCT ACT ACA GGG GAT TTA TCT ATT CCT AGT TCT 144 

Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser He Pro Ser Ser 
35 40 45 

15 GAG TTA GAA AAT ATT CCA TCG GAA AAC CAA TAT TTT CAA TCT GCT ATT 192 

Glu Leu Glu Asn He Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala He 
50 55 60 

TGG TCA GGA TTT ATC AAA GTT AAG AAG AGT GAT GAA TAT ACA TTT GCT 24 0 

20 Trp Ser Gly Phe He Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
65 ' 70 75 80 

ACT TCC GCT GAT AAT CAT GTA ACA ATG TGG GTA GAT GAC CAA GAA GTG 288 
Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 
25 85 90 95 

ATT AAT AAA GCT TCT AAT TCT AAC AAA ATC AGA TTA GAA AAA GGA AGA 336 
He Asn Lys Ala Ser Asn Ser Asn Lys He Arg Leu Glu Lys Gly Arg 
100 105 HO 

30 

TTA TAT CAA ATA AAA ATT CAA TAT CAA CGA GAA AAT CCT ACT GAA AAA 384 
Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
115 120 125 

35 GGA TTG GAT TTC AAG TTG TAC TGG ACC GAT TCT CAA AAT AAA AAA GAA 432 

Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 
130 135 140 

GTG ATT TCT AGT GAT AAC TTA CAA TTG CCA GAA TTA AAA CAA AAA TCT 480 
40 Val He Ser Ser Asp Asn Leu Gin Leu Pro Glu Leu Lys Gin Lys Ser 
145 150 155 160 

TCG AAC ACT GCC ACT ATC ATG ATG CAA CGT GGT AAT TTT CTG CAG GGA 528 
Ser Asn Thr Ala Thr He Met Met Gin Arg Gly Asn Phe Leu Gin Gly 
45 165 170 175 

CCT ACG GTT CCA GAC CGT GAC AAT GAT GGA ATC CCT GAT TCA TTA GAG 576 
Pro Thr Val Pro Asp Arg Asp Asn Asp Gly He Pro Asp Ser Leu Glu 
180 185 190 

50 

GTA GAA GGA TAT ACG GTT GAT GTC AAA AAT AAA AGA ACT TTT CTT TCA 624 
Val Glu Gly Tyr Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser 
195 200 205 

55 
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CCA TGG ATT TCT AAT ATT CAT GAA AAG AAA GGA TTA ACC AAA TAT AAA 
Pro Trp lie Ser Asn lie His Glu Lys Lys Gly Leu Thr Lys Tyr Lys 
210 215 220 

TCA TCT CCT GAA AAA TGG AGC ACG GCT TCT GAT CCG TAC AGT GAT TTC 
Ser Ser Pro Glu Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe 
225 230 235 240 

GAA AAG GTT ACA GGA CGG ATT GAT AAG AAT GTA TCA CCA GAG GCA AGA 
Glu Lys Val Thr Gly Arg He Asp Lys Asn Val Ser Pro Glu Ala Arg 
245 250 255 

CAC CCC CTT GTG GCA GCT TAT CCG ATT GTA CAT GTA GAT ATG GAG AAT 
His Pro Leu Val Ala Ala Tyr Pro He' Val His Val Asp Met Glu Asn 
260 265 270 

ATT ATT CTC TCA AAA AAT GAG GAT CAA TCC ACA CAG AAT ACT GAT AGT 
He He Leu Ser Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser 
275 280 285 

GAA ACG AGA ACA ATA AGT AAA AAT ACT TCT ACA AGT AGG ACA CAT ACT 
Glu Thr Arg Thr He Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr 
290 295 300 

AGT GAA GTA CAT GGA AAT GCA GAA GTG CAT GCG TCG TTC TTT GAT ATT 
Ser Glu Val His Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He 
305 310 315 320 

GGT GGG AGT GTA TCT GCA GGA TTT AGT AAT TCG AAT TCA AGT ACG GTC 
Gly Gly Ser Val Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val 
325 330 335 

GCA ATT GAT CAT TCA CTA TCT CTA GCA GGG GAA AGA ACT TGG GCT GAA 
Ala He Asp His Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu 
340 345 350 

ACA ATG GGT TTA AAT ACC GCT GAT ACA GCA AGA TTA AAT GCC AAT ATT 
Thr Met Gly Leu Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn He 
355 360 365 

AGA TAT GTA AAT ACT GGG ACG GCT CCA ATC TAC AAC GTG TTA CCA ACG 
Arg Tyr Val Asn Thr Gly Thr Ala Pro He Tyr Asn Val Leu Pro Thr 
370 375 380 

ACT TCG TTA GTG TTA GGA AAA AAT CAA ACA CTC GCG ACA ATT AAA GCT 
Thr Ser Leu Val Leu Gly Lys Asn Gin Thr Leu Ala Thr He Lys Ala 
385 390 395 400 

AAG GAA AAC CAA TTA AGT CAA ATA CTT GCA CCT AAT AAT TAT TAT CCT 
Lys Glu Asn Gin Leu Ser Gin He Leu Ala Pro Asn Asn Tyr Tyr Pro 
405 410 415 

TCT AAA AAC TTG GCG CCA ATC GCA TTA AAT GCA CAA GAC GAT TTC AGT 
Ser Lys Asn Leu Ala Pro He Ala Leu Asn Ala Gin Asp Asp Phe Ser 
420 425 430 

TCT ACT CCA ATT ACA ATG AAT TAC GGG AAT ATA GCA ACA TAC AAT TTT 
Ser Thr Pro He Thr Met Asn Tyr Gly Asn He Ala Thr Tyr Asn Phe 
435 440 445 

GAA AAT GGA AGA GTG AGG GTG GAT ACA GGC TCG AAC TGG AGT GAA GTG 
Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
450 455 460 

TTA CCG CAA ATT CAA GAA ACA ACT GCA CGT ATC ATT TTT AAT GGA AAA 
Leu Pro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 
465 470 475 480 

GAT TTA AAT CTG GTA GAA AGG CGG ATA GCG GCG GTT AAT CCT AGT GAT 
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CCA TTA GAA ACG ACT AAA CCG GAT ATG ACA TTA AAA GAA GCC CTT AAA 
5 Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
500 505 510 

ATA GCA TTT GGA TTT AAC GAA CCG AAT GGA AAC TTA CAA TAT CAA GGG 
lie Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
10 515 520 525 

AAA GAC ATA ACC GAA TTT GAT TTT AAT TTC GAT CAA CAA ACA TCT CAA 
Lys Asp lie Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
530 535 540 

15 

AAT ATC AAG AAT CAG TTA GCG GAA TTA AAC GCA ACT AAC ATA TAT ACT 
Asn lie Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn lie Tyr Thr 
545 550 555 560 

2 0 GTA TTA GAT AAA ATC AAA TTA AAT GCA AAA ATG AAT ATT TTA ATA AGA 

Val Leu Asp Lys lie Lys Leu Asn Ala Lys Met Asn lie Leu lie Arg 
5G5 570 575 

GAT AAA CGT TTT CAT TAT GAT AGA AAT AAC ATA GCA GTT GGG GCG GAT 
25 Asp Lys Arg Phe His Tyr Asp Arg Asn Asn lie Ala Val Gly Ala Asp 
580 " 585 590 

GAG TCA GTA GTT AAG GAG GCT CAT AGA GAA GTA ATT AAT TCG TCA ACA 
Glu Ser Val Val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 
30 595 600 605 

GAG GGA TTA TTG TTA AAT ATT GAT AAG GAT ATA AGA AAA ATA TTA TCA 
Glu Gly Leu Leu Leu Asn He Asp Lys Asp He Arg Lys He Leu Ser 
610 615 620 

35 

GGT TAT ATT GTA GAA ATT GAA GAT ACT GAA GGG CTT AAA GAA GTT ATA 
Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 
625 630 635 640 

40 AAT GAC AGA TAT GAT ATG TTG AAT ATT TCT AGT TTA CGG CAA GAT GGA 

Asn Asp Arg Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
645 650 655 

AAA ACA TTT ATA GAT TTT AAA AAA TAT AAT GAT AAA TTA CCG TTA TAT 
45 Lys Thr Phe He Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr 
660 665 670 

ATA AGT AAT CCC AAT TAT AAG GTA AAT GTA TAT GCT GTT ACT AAA GAA 
He Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 
50 675 680 685 

AAC ACT ATT ATT AAT CCT AGT GAG AAT GGG GAT ACT AGT ACC AAC GGG 

Asn Thr He He Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
690 695 700 

55 

ATC AAG AAA ATT TTA ATC TTT TCT AAA AAA GGC TAT GAG ATA GGA 

He Lys Lys He Leu He Phe Ser Lys Lys Gly Tyr Glu He Gly 
705 710 715 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 719 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 

Glu Val Lys Gin Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser 
15 10 15 

Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 
20 " 25 30 

Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser lie Pro Ser Ser 
35 40 45 

Glu Leu Glu Asn lie Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala He 
50 55 60 

Trp Ser Gly Phe He Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
65 70 75 80 

Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 
85 90 95 



Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
115 120 125 

Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 

130 135 140 

Val He Ser Ser Asp Asn Leu Gin Leu Pro Glu Leu Lys Gin Lys Ser 
145 150 155 160 



Pro Thr Val Pro Asp Arg Asp Asn Asp Gly He Pro Asp Ser Leu Glu 
180 185 190 



Pro Trp He Ser Asn He His Glu Lys Lys Gly Leu Thr Lys Tyr Lys 
210 215 220 



Ala He Asp His Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu 
340 345 350 
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Thr Met Gly Leu Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn lie 
355 360 365 

Arg Tyr Val Asn Thr Gly Thr Ala Pro lie Tyr Asn Val Leu Pro Thr 
370 375 380 

Thr Ser Leu Val Leu Gly Lys Asn Gin Thr Leu Ala Thr lie Lys Ala 
385 390 395 400 



Ser Lys Asn Leu Ala Pro He Ala Leu Asn Ala Gin Asp Asp Phe Ser 
420 425 430 

Ser Thr Pro He Thr Met Asn Tyr Gly Asn He Ala Thr Tyr Asn Phe 

435 440 445 

Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
450 455 460 

Leu Pro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 
465 470 475 480 

Asp Leu Asn Leu Val Glu Arg Arg He Ala Ala Val Asn Pro Ser Asp 
485 490 495 

Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
500 505 510 



Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
530 535 540 

Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 
545 550 555 560 



Asp Lys Arg Phe His Tyr Asp Arg Asn Asn He Ala Val Gly Ala Asp 
580 585 590 



Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 
625 630 635 640 



Asn Thr He He Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
690 695 700 

He Lys Lys He Leu He Phe Ser Lys Lys Gly Tyr Glu He Gly 
705 710 715 



WO 94/18332 



PCT/US94/01624 



116 

WHAT IS CLAIMED IS: 

1. A nucleic acid encoding a fusion protein, 
comprising a nucleotide sequence encoding the anthrax 

5 protective antigen (PA) binding domain of the native anthrax 
lethal factor (LF) protein and a nucleotide sequence encoding 
an activity inducing domain of a second protein. 

2. The nucleic acid of claim 1, wherein the second 
10 protein is a toxin. 

3. The nucleic acid of claim 2, wherein the toxin is 
Pseudomonas exotoxin A. 

15 4 . The nucleic acid of claim 2 , wherein the toxin is 

the A chain of Diphtheria toxin. 

5. The nucleic acid of claim 2, wherein the toxin is 
Shiga toxin. 

20 

6. The nucleic acid of claim 1, comprising the 
nucleotide sequence defined in the Sequence Listing as SEQ ID 
NO: 5. 

25 7. The nucleic acid of claim 1, comprising the 

nucleotide sequence defined in the Sequence Listing as SEQ ID 
NO: 6. 

8. A protein encoded by the nucleic acid of claim 1. 

30 

9. A vector comprising the nucleic acid of claim 1. 

10. The vector of claim 9 in a host capable of 
expressing the protein encoded by the nucleic acid. 

35 



11. A nucleic acid encoding a fusion protein, the 
nucleic acid comprising a nucleotide sequence encoding the 
translocation domain and anthrax lethal factor (LF) binding 
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domain of native anthrax protective antigen (PA) protein and a 
nucleotide sequence encoding a ligand domain which 
specifically binds a cellular target. 

5 12. The nucleic acid of claim 11, wherein the ligand 

domain specifically binds to an HIV protein expressed on the 
surface of an HIV-infected cell. 

13. The nucleic acid of claim 11, wherein the ligand 
10 domain is a growth factor. 

14. The nucleic acid of claim 11, wherein the 
nucleotide sequence encoding the translocation domain and LF 
binding domain of the native PA protein further comprises the 

15 nucleotide sequence encoding the remainder of the native PA 
protein. 

15. A protein encoded by the nucleic acid of claim 

11. 

20 

16. A vector comprising the nucleic acid of claim 11. 

17. The vector of claim 16 in a host capable of 
expressing the protein encoded by the nucleic acid. 

25 

18. A method of killing a tumor cell in a subject, 
the method comprising the steps of: 

a) administering to the subject a first fusion 
protein comprising the translocation domain and LF binding 

3 0 domain of the native PA protein and a tumor cell specific 

ligand domain in an amount sufficient to bind to a tumor cell; 
and 

b) administering to the subject a second fusion 
protein comprising the PA binding domain of the native LF 

35 protein and a cytotoxic domain of a non-LF protein in an 

amount sufficient to bind to the first protein, whereby the 
second protein is internalized into the tumor cell and kills 
the tumor cell . 
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19. A method of killing HIV-infected cells in a 
subject, the method comprising the steps of: 

a) administering to the subject a first fusion 
protein comprising the translocation domain and LF binding 

5 domain of the native PA protein and a ligand domain that 

specifically binds to an HIV protein expressed on the surface 
of an HIV-infected cell in an amount sufficient to bind to an 
HIV-infected cell; and 

b) administering to the subject a second fusion 
10 protein comprising the PA binding domain of the native LF 

protein and a cytotoxic domain of a non-LF protein in an 
amount sufficient to bind to the first protein, whereby the 
second protein is internalized into the HIV-infected cell and 
kills the HIV-infected cell, thereby preventing propagation of 
15 HIV . 

20. A method for delivering an activity to a cell 
comprising the steps of: 

a) administering to the cell a protein comprising 
20 the translocation domain and the LF binding domain of the 

native PA protein and a ligand domain; and 

b) administering to the cell a compound comprising 
the PA binding domain of the native LF protein chemically 
attached to an activity inducing moiety, whereby the compound 

25 administered in step b) is internalized into the cell and 
effects the activity within the cell. 



inducing moiety is a polypeptide. 

23. The method of claim 22, wherein the polypeptide 
35 is a growth factor. 

24. The method of claim 20, wherein the activity 
inducing moiety is an antisense nucleic acid. 



21. The method of claim 20, wherein 
is the receptor binding domain of the native 



the ligand domain 
PA protein. 



30 



22. The method of claim 20, wherein 



the activity 
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25. The method of claim 20, wherein the activity- 
inducing moiety is a nucleic acid encoding a desired gene 
product . 

5 26. A compound comprising the PA binding domain of 

the native LF protein chemically attached to a non-LF activity 
inducing moiety. 

27. The composition of claim 26, wherein the activity 
10 inducing moiety is a polypeptide. 

28. The composition of claim 26, wherein the activity 
inducing moiety is a radioisotope. 

15 29. The composition of claim 26, wherein the activity 

inducing moiety is an antisense nucleic acid. 

30. The composition of claim 26, wherein the activity 
inducing moiety is a nucleic acid encoding a desired gene 

20 product. 

31. The nucleic acid of claim 11, comprising the 
nucleotide sequence defined in the Sequence Listing as SEQ ID 
NO: 11. 



32. A nucleic acid comprising a nucleotide sequence 
encoding an anthrax protective antigen which is altered to 
include a cleavage site recognized by a protease produced by 
an intracellular pathogen. 

30 

33 . The nucleic acid of claim 32 wherein the 
intracellular pathogen is a virus. 



35 



34. The nucleic acid of claim 33 wherein the 
alteration comprises a mutation in at least one of amino acid 
residues 164-167 (the trypsin cleavage site) . 
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35. The nucleic acid of claim 34 wherein the virus is 
a retrovirus . 

36. The nucleic acid of claim 35 wherein the 
5 retrovirus is an HIV. 

37. The nucleic acid of claim 36 wherein the amino 
acids at residues 164-167 are replaced with an amino acid 
sequence selected from the group comprising NTATIMMQRGNF , 

10 QVSQNYPIVQNI , TVSFNFPQITLW, and GGSAFNFPIVMGG. 

38. A polypeptide comprising an amino acid sequence 
encoding an anthrax protective antigen which is altered to 
include a cleavage site recognized by a protease produced by a 

15 retrovirus. 

39. The polypeptide of claim 38 wherein the 
alteration comprises a mutation in at least one of amino acid 
residues 164-167 (the trypsin cleavage site). 



20 



40. The polypeptide of claim 39 wherein the 
retrovirus is an HIV. 

41. The polypeptide of claim 40 wherein the amino 
acid residues 164-167 are replaced with an amino acid sequence 
selected from the group comprising NTATIMMQRGNF, QVSQNYPIVQNI, 
TVSFNFPQITLW, and GGSAFNFPIVMGG. 

42 . A method of killing a cell which is infected with 
an intracellular pathogen, the method comprising: 

applying to the cell a composition comprising an 
effective amount an altered anthrax protective antigen (PA) 
having a cleavage site recognized by a protease produced by 
the intracellular pathogen. 

43 . The method of claim 42 wherein the cleavage site 
is at amino acid residues 164-167. 
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44. The method of claim 42 wherein the intracellular 
pathogen is a virus . 

45 . The method of claim 44 wherein the virus is a 
5 retrovirus. 

46. A method of claim 45 wherein the retrovirus is an 

HIV. 

10 47. The method of claim 46 wherein the amino acids at 

residues 164-167 are replaced with an amino acid sequence 
selected from the group comprising NTATIMMQRGNF , QVSQNYPIVQNI , 
TVSFNFPQITLW, and GGSAFNFPIVMGG. 

15 48. The method of claim 42 wherein the cell is 

harbored in a human. 

49. The method of claim 48 wherein the step of 
applying the composition includes parenterally administering 

20 the composition to the human. 

50. The method of claim 49 wherein the parenteral 
administration is intravenous. 

25 51. The method of claim 48 wherein the effective 

amount of altered protective antigen is from about 5 to about 
25 micrograms per kilogram of body weight of a human harboring 
the infected cell. 

30 52. The method of claim 51 wherein the effective 

amount of altered protective antigen is about 10 micrograms 
per kilogram of body weight of a human harboring the infected 
cell. 
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